Code
pacman::p_load(readxl, openxlsx, data.table, tidyverse, timetk, plotly, forecast, ggplot2, CGPfunctions, rnaturalearth, rnaturalearthdata, sf, viridis, ggrepel, ggHoriPlot, ggthemes, tidymodels, timetk, modeltime)Andrea Yeo
February 24, 2025
March 8, 2025
All values are in Million Dollars
Since Mr. Donald Trump assumed office as the President of the United States on January 20, 2025, global trade has been a highly scrutinized topic. Given Singapore’s role as a key global trade hub, understanding its trade dynamics is crucial in assessing the potential impact of shifting geopolitical and economic policies.
In this take-home exercise, we apply newly acquired data visualization and analytical techniques to explore Singapore’s Trade in Services - Singapore’s exports and imports of services between Singapore and the rest of the world. By leveraging statistical tools and visualization methods, this study aims to uncover key trends, service category distributions, and trade imbalances while providing deeper insights into how different service sectors contribute to Singapore’s economy.
Using data from the Department of Statistics Singapore, DOS on Trade In Services By Services Category, this analysis applies Exploratory Data (EDA) and data visualization techniques to:
The following R packages will be loaded for this exercise using pacman::p_load():
The original visualizations analyzed in this analysis are sourced from here.

In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) clarity, and (2) visual appeal:
| Pros | Cons | Suggested fixes |
|---|---|---|
| Clear categorization of exports and imports - The chart effectively differentiates between exports and imports | Year-specific colors make trend analysis difficult – Each year is assigned a different color, making it hard to track trends across time. | Use consistent colors for exports and imports across all years (Implemented by converting them into line charts). |
| Total trade values are highlighted – The total trade (Exports + Imports) is displayed for each year. | Floating “Total” labels can be easily overlooked – They are placed above bars separately, which may lead to misinterpretation. | Introduce a line chart for total trade trends instead of floating labels. |
| Growth rate (CAGR) and trade balance data are included – Additional insights are provided. | Trade balance is not well integrated – It is displayed separately at the bottom instead of being visually linked to the bars. | Represent trade balance directly as a bar chart (Green for surplus, Red for deficit) |
| Pros | Cons | Suggested fixes |
|---|---|---|
| Engaging use of colors and icons – Makes the visualization appealing and eye-catching. | Overuse of colors creates clutter – Different colors for each year make it visually overwhelming. | Reduce unnecessary color variations and simplify color coding. |
The code chunk below imports the Trade In Services By Services Category dataset, downloaded from Department of Statistics Singapore, DOS, using the read_excel() function from the readxl package.
glimpse(): provides a transposed overview of a dataset, showing variables and their types in a concise format.head(): displays the first few rows of a dataset (default is 6 rows) to give a quick preview of the data.summary(): generates a statistical summary of each variable, including measures like mean, median, and range for numeric data.duplicated():returns a logical vector indicating which elements or rows in a vector or data frame are duplicates.colSums(is.na()): counts the number of missing values (NA) in each column of the data frame.str(): use str() to display the column names, data types, and a preview of the data.Rows: 51
Columns: 26
$ `Data Series` <chr> "Total Trade In Services", "Exports Of Services", "Manuf…
$ `2024` <dbl> 997749.8, 528568.3, 471.0, 10820.1, 172971.1, 145993.3, …
$ `2023` <dbl> 919117.0, 481009.2, 490.7, 10981.0, 149537.4, 124222.5, …
$ `2022` <dbl> 877252.5, 468190.5, 685.5, 10117.9, 189649.0, 167209.2, …
$ `2021` <dbl> 715093.5, 382492.4, 489.9, 8833.5, 144220.0, 130499.3, 1…
$ `2020` <dbl> 590035.1, 300004.6, 236.8, 8172.9, 93560.0, 79857.9, 137…
$ `2019` <dbl> 592824.3, 307215.9, 266.9, 9663.3, 94272.1, 76545.7, 177…
$ `2018` <dbl> 561271.0, 287141.4, 370.3, 9410.0, 88888.8, 71746.4, 171…
$ `2017` <dbl> 493353.8, 241568.0, 247.1, 7712.0, 69993.5, 54547.9, 154…
$ `2016` <dbl> 431109.3, 211835.8, 284.8, 8418.5, 59213.4, 45873.0, 133…
$ `2015` <dbl> 432922.3, 210622.7, 346.5, 9315.2, 64097.1, 50798.1, 132…
$ `2014` <dbl> 406020.8, 194843.2, 424.4, 9853.1, 63918.8, 50917.2, 130…
$ `2013` <dbl> 365055.0, 177719.3, 283.2, 10767.2, 57830.9, 45929.4, 11…
$ `2012` <dbl> 327866.3, 161769.2, 249.6, 9053.1, 55586.3, 42864.3, 127…
$ `2011` <dbl> 298227.8, 150013.0, 260.4, 9342.9, 53523.0, 41416.7, 121…
$ `2010` <dbl> 273929.7, 136872.3, 289.5, 8648.4, 52606.6, 41214.6, 113…
$ `2009` <dbl> 238962.6, 117832.0, 323.4, 9128.1, 43365.7, 33042.6, 103…
$ `2008` <dbl> 254282.0, 126155.0, 452.1, 8354.6, 51108.8, 38561.9, 125…
$ `2007` <dbl> 223936.4, 110796.6, 492.6, 6605.6, 43642.8, 31104.6, 125…
$ `2006` <dbl> 195966.0, 92674.8, 534.7, 5701.0, 35877.1, 24748.0, 1112…
$ `2005` <dbl> 167532.8, 75904.8, 315.0, 4797.7, 32435.1, 21286.1, 1114…
$ `2004` <dbl> 150911.1, 66795.8, 353.2, 3450.5, 28630.3, 18702.3, 9928…
$ `2003` <dbl> 122427.9, 52966.2, 303.1, 2883.1, 23343.2, 15022.2, 8321…
$ `2002` <dbl> 109710.5, 49936.3, 369.5, 3071.8, 21539.3, 12656.8, 8882…
$ `2001` <dbl> 102892.5, 46286.1, 195.0, 2099.5, 20497.9, 12079.2, 8418…
$ `2000` <dbl> 96452.4, 44854.8, 202.4, 1755.3, 20379.3, 11595.5, 8783.…
# A tibble: 6 × 26
`Data Series` `2024` `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Total Trade In… 9.98e5 9.19e5 8.77e5 7.15e5 5.90e5 5.93e5 5.61e5 4.93e5 4.31e5
2 Exports Of Ser… 5.29e5 4.81e5 4.68e5 3.82e5 3.00e5 3.07e5 2.87e5 2.42e5 2.12e5
3 Manufacturing … 4.71e2 4.91e2 6.85e2 4.90e2 2.37e2 2.67e2 3.70e2 2.47e2 2.85e2
4 Maintenance An… 1.08e4 1.10e4 1.01e4 8.83e3 8.17e3 9.66e3 9.41e3 7.71e3 8.42e3
5 Transport 1.73e5 1.50e5 1.90e5 1.44e5 9.36e4 9.43e4 8.89e4 7.00e4 5.92e4
6 Freight 1.46e5 1.24e5 1.67e5 1.30e5 7.99e4 7.65e4 7.17e4 5.45e4 4.59e4
# ℹ 16 more variables: `2015` <dbl>, `2014` <dbl>, `2013` <dbl>, `2012` <dbl>,
# `2011` <dbl>, `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>,
# `2006` <dbl>, `2005` <dbl>, `2004` <dbl>, `2003` <dbl>, `2002` <dbl>,
# `2001` <dbl>, `2000` <dbl>
Data Series 2024 2023 2022
Length:51 Min. : 57.8 Min. : 56.2 Min. : 59.6
Class :character 1st Qu.: 1855.3 1st Qu.: 1817.6 1st Qu.: 1603.0
Mode :character Median : 13495.7 Median : 12907.1 Median : 11700.4
Mean : 71188.6 Mean : 65608.8 Mean : 63405.1
3rd Qu.: 45456.5 3rd Qu.: 43514.1 3rd Qu.: 37732.8
Max. :997749.8 Max. :919117.0 Max. :877252.5
2021 2020 2019 2018
Min. : 62.4 Min. : 57.5 Min. : 52.1 Min. : 66.8
1st Qu.: 1350.0 1st Qu.: 1209.1 1st Qu.: 1134.8 1st Qu.: 1051.2
Median : 9180.3 Median : 9124.8 Median : 10250.8 Median : 9410.0
Mean : 51559.2 Mean : 42075.8 Mean : 41792.4 Mean : 39491.2
3rd Qu.: 35308.2 3rd Qu.: 30748.8 3rd Qu.: 29536.0 3rd Qu.: 27016.0
Max. :715093.5 Max. :590035.1 Max. :592824.3 Max. :561271.0
2017 2016 2015 2014
Min. : 56.1 Min. : 72.0 Min. : 46.9 Min. : 56.9
1st Qu.: 957.6 1st Qu.: 811.6 1st Qu.: 809.8 1st Qu.: 739.1
Median : 7848.2 Median : 6458.8 Median : 6194.0 Median : 6021.0
Mean : 34523.8 Mean : 29987.0 Mean : 30256.4 Mean : 28284.6
3rd Qu.: 23594.4 3rd Qu.: 22798.2 3rd Qu.: 23150.7 3rd Qu.: 22757.6
Max. :493353.8 Max. :431109.3 Max. :432922.3 Max. :406020.8
2013 2012 2011 2010
Min. : 78.8 Min. : 84.6 Min. : 62.5 Min. : 64.0
1st Qu.: 661.6 1st Qu.: 656.4 1st Qu.: 617.9 1st Qu.: 587.1
Median : 4647.2 Median : 4034.2 Median : 3397.2 Median : 3493.8
Mean : 25332.7 Mean : 22691.6 Mean : 20633.9 Mean : 19039.0
3rd Qu.: 21875.2 3rd Qu.: 19465.0 3rd Qu.: 17328.2 3rd Qu.: 16731.5
Max. :365055.0 Max. :327866.3 Max. :298227.8 Max. :273929.7
2009 2008 2007 2006
Min. : 51.0 Min. : 76.9 Min. : 130.4 Min. : 48.6
1st Qu.: 579.8 1st Qu.: 536.3 1st Qu.: 441.2 1st Qu.: 413.8
Median : 3015.6 Median : 3238.9 Median : 2213.8 Median : 2080.4
Mean : 16671.3 Mean : 17870.3 Mean : 15789.5 Mean : 13848.0
3rd Qu.: 13932.2 3rd Qu.: 14221.4 3rd Qu.: 13344.4 3rd Qu.: 11489.8
Max. :238962.6 Max. :254282.0 Max. :223936.4 Max. :195966.0
2005 2004 2003 2002
Min. : 24.7 Min. : 17.3 Min. : 17.2 Min. : 14.3
1st Qu.: 313.1 1st Qu.: 284.9 1st Qu.: 228.2 1st Qu.: 212.3
Median : 1524.5 Median : 1374.8 Median : 1108.7 Median : 1001.9
Mean : 11792.0 Mean : 10647.8 Mean : 8587.8 Mean : 7648.4
3rd Qu.: 10741.8 3rd Qu.: 9462.4 3rd Qu.: 7513.7 3rd Qu.: 8339.2
Max. :167532.8 Max. :150911.1 Max. :122427.9 Max. :109710.5
2001 2000
Min. : 22.6 Min. : 13.6
1st Qu.: 188.7 1st Qu.: 170.6
Median : 778.0 Median : 794.0
Mean : 7219.0 Mean : 6794.9
3rd Qu.: 8164.2 3rd Qu.: 7509.9
Max. :102892.5 Max. :96452.4
# A tibble: 0 × 26
# ℹ 26 variables: Data Series <chr>, 2024 <dbl>, 2023 <dbl>, 2022 <dbl>,
# 2021 <dbl>, 2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <dbl>,
# 2015 <dbl>, 2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>,
# 2009 <dbl>, 2008 <dbl>, 2007 <dbl>, 2006 <dbl>, 2005 <dbl>, 2004 <dbl>,
# 2003 <dbl>, 2002 <dbl>, 2001 <dbl>, 2000 <dbl>
Data Series 2024 2023 2022 2021 2020
0 0 0 0 0 0
2019 2018 2017 2016 2015 2014
0 0 0 0 0 0
2013 2012 2011 2010 2009 2008
0 0 0 0 0 0
2007 2006 2005 2004 2003 2002
0 0 0 0 0 0
2001 2000
0 0
drop_na() function to drop rows where any specified column contains a missing value.tibble [51 × 26] (S3: tbl_df/tbl/data.frame)
$ Data Series: chr [1:51] "Total Trade In Services" "Exports Of Services" "Manufacturing Services On Physical Inputs Owned By Others" "Maintenance And Repair Services" ...
$ 2024 : num [1:51] 997750 528568 471 10820 172971 ...
$ 2023 : num [1:51] 919117 481009 491 10981 149537 ...
$ 2022 : num [1:51] 877253 468191 686 10118 189649 ...
$ 2021 : num [1:51] 715094 382492 490 8834 144220 ...
$ 2020 : num [1:51] 590035 300005 237 8173 93560 ...
$ 2019 : num [1:51] 592824 307216 267 9663 94272 ...
$ 2018 : num [1:51] 561271 287141 370 9410 88889 ...
$ 2017 : num [1:51] 493354 241568 247 7712 69994 ...
$ 2016 : num [1:51] 431109 211836 285 8418 59213 ...
$ 2015 : num [1:51] 432922 210623 346 9315 64097 ...
$ 2014 : num [1:51] 406021 194843 424 9853 63919 ...
$ 2013 : num [1:51] 365055 177719 283 10767 57831 ...
$ 2012 : num [1:51] 327866 161769 250 9053 55586 ...
$ 2011 : num [1:51] 298228 150013 260 9343 53523 ...
$ 2010 : num [1:51] 273930 136872 290 8648 52607 ...
$ 2009 : num [1:51] 238963 117832 323 9128 43366 ...
$ 2008 : num [1:51] 254282 126155 452 8355 51109 ...
$ 2007 : num [1:51] 223936 110797 493 6606 43643 ...
$ 2006 : num [1:51] 195966 92675 535 5701 35877 ...
$ 2005 : num [1:51] 167533 75905 315 4798 32435 ...
$ 2004 : num [1:51] 150911 66796 353 3450 28630 ...
$ 2003 : num [1:51] 122428 52966 303 2883 23343 ...
$ 2002 : num [1:51] 109711 49936 370 3072 21539 ...
$ 2001 : num [1:51] 102893 46286 195 2100 20498 ...
$ 2000 : num [1:51] 96452 44855 202 1755 20379 ...
The trade_services tibble contains 26 attributes, as shown above.
The following preprocessing checks were conducted as part of data preparation:
trade_services dataset using glimpse() and str()duplicated() in the datasetcolSums(is.na())trade_services dataset will be used for both DataVis makeover 1 & 2After importing the trade_services dataset, we will filter for the three rows - Exports of Services, Imports of Services, and Total Trade in Services, which are essential for recreating the original visualization.
# A tibble: 3 × 26
`Data Series` `2024` `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Total Trade In… 9.98e5 9.19e5 8.77e5 7.15e5 5.90e5 5.93e5 5.61e5 4.93e5 4.31e5
2 Exports Of Ser… 5.29e5 4.81e5 4.68e5 3.82e5 3.00e5 3.07e5 2.87e5 2.42e5 2.12e5
3 Imports Of Ser… 4.69e5 4.38e5 4.09e5 3.33e5 2.90e5 2.86e5 2.74e5 2.52e5 2.19e5
# ℹ 16 more variables: `2015` <dbl>, `2014` <dbl>, `2013` <dbl>, `2012` <dbl>,
# `2011` <dbl>, `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>,
# `2006` <dbl>, `2005` <dbl>, `2004` <dbl>, `2003` <dbl>, `2002` <dbl>,
# `2001` <dbl>, `2000` <dbl>
Key makeover changes:
1️⃣ Overall design improvements:
Replaced bar chart with a line chart for exports and imports –> Helps in tracking trends more effectively over time instead of color-coded bars per year.
Introduced a dashed line for total trade values –> Previously, total trade was only displayed as floating labels above bars, which could be overlooked.
Changed color usage –> Original chart had different colors for each year, making trend analysis harder. Now, consistent colors are used:
Exports are in red,
Imports are in blue,
Total Trade is in black (dashed),
Trade Surplus is in green, and
Trade Deficit is in red.
2️⃣ Trade Balance Integration:
Integration of key insights –> Previously trade balance was displayed separately as a small section at the bottom. Now, I have integrated it as a bar chart within the main graph
green bars represent trade surplus, and
red bars will represent trade deficit.

# Filter relevant rows (Exports, Imports, and Total Trade)
export_vs_import <- trade_services %>%
filter(`Data Series` %in% c("Exports Of Services", "Imports Of Services", "Total Trade In Services"))
# Convert dataset from wide to long format for ggplot
data_long <- export_vs_import %>%
pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year >= 2020 & Year <= 2024)
# Extract `Total Trade In Services` from dataset
total_trade_data <- data_long %>%
filter(`Data Series` == "Total Trade In Services") %>%
select(Year, Value) %>%
rename(Total_Trade = Value)
# Calculate Trade Balance (Exports - Imports)
trade_balance <- data_long %>%
spread(`Data Series`, Value) %>%
mutate(Services_Trade_Balance = `Exports Of Services` - `Imports Of Services`) %>%
select(Year, Services_Trade_Balance)
# Merge Trade Balance & Total Trade into DataFrame
data_long <- left_join(data_long, trade_balance, by = "Year")
data_long <- left_join(data_long, total_trade_data, by = "Year")
# Increase plotting window size
options(repr.plot.width=16, repr.plot.height=9)
# Create the plot
p <- ggplot(data_long) +
# **Line Chart for Exports**
geom_line(data = subset(data_long, `Data Series` == "Exports Of Services"),
aes(x = Year, y = Value, color = "Exports"), linewidth = 1.5) +
# **Data Labels for Exports**
geom_text(data = subset(data_long, `Data Series` == "Exports Of Services"),
aes(x = Year, y = Value, label = round(Value, 1)),
vjust = -1, size = 2.5, color = "red") +
# **Line Chart for Imports**
geom_line(data = subset(data_long, `Data Series` == "Imports Of Services"),
aes(x = Year, y = Value, color = "Imports"), linewidth = 1.5) +
# **Data Labels for Imports**
geom_text(data = subset(data_long, `Data Series` == "Imports Of Services"),
aes(x = Year, y = Value, label = round(Value, 1)),
vjust = 1.5, size = 2.5, color = "blue") +
# **Line Chart for Total Trade (Black, Dashed)**
geom_line(data = total_trade_data,
aes(x = Year, y = Total_Trade, color = "Total Trade"), linewidth = 1.5, linetype = "dashed") +
# **Labels for Total Trade**
geom_text(data = total_trade_data,
aes(x = Year, y = Total_Trade + 20000, label = round(Total_Trade, 1)),
vjust = -0.5, size = 2.5, color = "black") +
# **Bar Chart for Trade Balance (Surplus = Green, Deficit = Red)**
geom_bar(data = trade_balance,
aes(x = Year, y = Services_Trade_Balance,
fill = ifelse(Services_Trade_Balance >= 0, "Surplus", "Deficit")),
stat = "identity", width = 0.5) +
# **Move Labels to the Top of Trade Balance Bars**
geom_text(data = trade_balance,
aes(x = Year, y = Services_Trade_Balance + 5000,
label = round(Services_Trade_Balance, 1)),
vjust = -0.5, size = 2.5, color = "black") +
# **Title & Labels**
labs(title = "Trends in International Trade in Services (2020-2024)",
subtitle = "Exports, Imports & Total Trade as Lines, Trade Balance as Bars",
x = "Year", y = "S$ Billion") +
# **Custom Theme**
theme_minimal(base_size = 10) +
# **Color Customization**
scale_color_manual(name = "Category", values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
scale_fill_manual(name = "Trade Balance", values = c("Surplus" = "green", "Deficit" = "red")) +
# **Formatting**
theme(
legend.position = "bottom",
panel.grid.major.y = element_line(color = "gray", linetype = "dashed"),
axis.text.x = element_text(face = "bold"))
# Display the plot
print(p)I conducted time-series analysis to analyze trends in exports, imports, and total trade from 2000 to 2024. First, I visualized the historical data using ggplot2, ensuring clear differentiation of trends by assigning red for exports, blue for imports, and black for total trade. This provided insights into the overall growth patterns and fluctuations over time.
Key observations:
# Load necessary libraries
library(ggplot2)
library(tidyverse)
library(forecast)
library(plotly)
# Convert data from wide to long format
export_vs_import_long <- export_vs_import %>%
pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
mutate(Year = as.numeric(Year)) %>%
arrange(Year) # Ensure years are sorted in ascending order
# Create separate data frames for each category to ensure proper order
exports_data <- export_vs_import_long %>% filter(`Data Series` == "Exports Of Services")
imports_data <- export_vs_import_long %>% filter(`Data Series` == "Imports Of Services")
total_trade_data <- export_vs_import_long %>% filter(`Data Series` == "Total Trade In Services")
# Convert to time-series format (with correctly sorted values)
exports_ts <- ts(exports_data$Value, start = exports_data$Year[1], frequency = 1)
imports_ts <- ts(imports_data$Value, start = imports_data$Year[1], frequency = 1)
total_trade_ts <- ts(total_trade_data$Value, start = total_trade_data$Year[1], frequency = 1)
# Create Data Frame for Plotting
df <- data.frame(
Year = rep(exports_data$Year, 3),
Value = c(exports_data$Value, imports_data$Value, total_trade_data$Value),
Category = rep(c("Exports", "Imports", "Total Trade"), each = nrow(exports_data))
)
# Create a ggplot with explicit group aesthetic
p <- ggplot(df, aes(x = Year, y = Value, color = Category, group = Category,
text = paste0("Year: ", Year, "<br>Value: ", round(Value, 1)))) +
geom_line(linewidth = 1.2) + # Use linewidth instead of size (ggplot2 3.4.0+)
scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
ggtitle("Export vs Import vs Total Trade Trends") +
ylab("Value (in Millions)") + xlab("Year") +
theme_minimal()
# Convert to interactive plot with tooltips
interactive_plot <- ggplotly(p, tooltip = "text")
# Display interactive plot
interactive_plotBased on the model accuracy metrics, ARIMA outperforms ETS in forecasting both exports and imports. ARIMA consistently shows lower RMSE, MAE, and MAPE values, indicating that its predictions are more precise and closer to actual observations. Thus, ARIMA will be the preferred model for time series forecasting.
ME RMSE MAE MPE MAPE MASE
Training set 2250.362 11275.74 8425.338 -0.1740338 7.973583 0.5737539
Test set 66791.835 86554.64 77415.700 13.2622158 16.803450 5.2719028
ACF1
Training set 0.05311589
Test set NA
ME RMSE MAE MPE MAPE MASE
Training set 3516.706 10885.93 7472.406 2.582523 5.207104 0.5088606
Test set 54848.687 74121.02 67064.995 10.655012 14.727052 4.5670340
ACF1
Training set -0.109228
Test set NA
# A tibble: 4 × 5
Model Type RMSE MAE MAPE
<chr> <chr> <dbl> <dbl> <dbl>
1 ETS Exports 86555. 77416. 16.8
2 ETS Imports 89216. 74235. 17.4
3 ARIMA Exports 74121. 67065. 14.7
4 ARIMA Imports 81957. 68397. 16.0
library(forecast)
library(tidyverse)
library(ggplot2)
# Set forecast horizon (e.g., last 5 years)
horizon <- 5
# Ensure there are enough observations for training and test
if (length(exports_ts) <= horizon | length(imports_ts) <= horizon) {
stop("Not enough data points for proper train-test split!")
}
# Split data: Use last `horizon` years as the test set
train_exports <- head(exports_ts, length(exports_ts) - horizon)
test_exports <- tail(exports_ts, horizon)
train_imports <- head(imports_ts, length(imports_ts) - horizon)
test_imports <- tail(imports_ts, horizon)
# Fit ETS models
ets_exports <- ets(train_exports)
ets_imports <- ets(train_imports)
# Fit ARIMA models
arima_exports <- auto.arima(train_exports)
arima_imports <- auto.arima(train_imports)
# 🔹 Generate Forecasts BEFORE extracting forecast values
ets_forecast_exports <- forecast(ets_exports, h = horizon)
ets_forecast_imports <- forecast(ets_imports, h = horizon)
arima_forecast_exports <- forecast(arima_exports, h = horizon)
arima_forecast_imports <- forecast(arima_imports, h = horizon)
# 🔹 Extract the forecasted mean values
ets_forecast_exports_values <- as.numeric(ets_forecast_exports$mean)
ets_forecast_imports_values <- as.numeric(ets_forecast_imports$mean)
arima_forecast_exports_values <- as.numeric(arima_forecast_exports$mean)
arima_forecast_imports_values <- as.numeric(arima_forecast_imports$mean)
# Ensure test sets are numeric
test_exports <- as.numeric(test_exports)
test_imports <- as.numeric(test_imports)
# Ensure test and forecast lengths match
if (length(test_exports) != length(ets_forecast_exports_values)) {
test_exports <- head(test_exports, length(ets_forecast_exports_values))
}
if (length(test_imports) != length(ets_forecast_imports_values)) {
test_imports <- head(test_imports, length(ets_forecast_imports_values))
}
# 🔹 Use accuracy() correctly with the model object
ets_accuracy_exports <- forecast::accuracy(ets_forecast_exports, test_exports)
ets_accuracy_imports <- forecast::accuracy(ets_forecast_imports, test_imports)
arima_accuracy_exports <- forecast::accuracy(arima_forecast_exports, test_exports)
arima_accuracy_imports <- forecast::accuracy(arima_forecast_imports, test_imports)
print(ets_accuracy_exports)
print(arima_accuracy_exports)
# 🔹 Combine accuracy results into a structured dataframe
accuracy_df <- tibble(
Model = rep(c("ETS", "ARIMA"), each = 2),
Type = rep(c("Exports", "Imports"), times = 2),
RMSE = c(ets_accuracy_exports["Test set", "RMSE"],
ets_accuracy_imports["Test set", "RMSE"],
arima_accuracy_exports["Test set", "RMSE"],
arima_accuracy_imports["Test set", "RMSE"]),
MAE = c(ets_accuracy_exports["Test set", "MAE"],
ets_accuracy_imports["Test set", "MAE"],
arima_accuracy_exports["Test set", "MAE"],
arima_accuracy_imports["Test set", "MAE"]),
MAPE = c(ets_accuracy_exports["Test set", "MAPE"],
ets_accuracy_imports["Test set", "MAPE"],
arima_accuracy_exports["Test set", "MAPE"],
arima_accuracy_imports["Test set", "MAPE"])
)
# Print final accuracy dataframe
print(accuracy_df)Next, I applied ARIMA modeling (auto.arima()) using the forecast package to predict trade values for the next five years (2025-2029).




library(forecast)
# Apply ARIMA Model
arima_exports <- auto.arima(exports_ts)
arima_imports <- auto.arima(imports_ts)
arima_total_trade <- auto.arima(total_trade_ts)
# Forecast for the next 5 years
arima_forecast_exports <- forecast(arima_exports, h = 5)
arima_forecast_imports <- forecast(arima_imports, h = 5)
arima_forecast_total_trade <- forecast(arima_total_trade, h = 5)
# Plot ARIMA forecasts
autoplot(arima_forecast_exports) + ggtitle("ARIMA Forecast for Exports")
autoplot(arima_forecast_imports) + ggtitle("ARIMA Forecast for Imports")
autoplot(arima_forecast_total_trade) + ggtitle("ARIMA Forecast for Total Trade")
# Convert forecasts to data frames for ggplot
df_exports <- as.data.frame(arima_forecast_exports) %>% mutate(Year = seq(max(exports_data$Year) + 1, by = 1, length.out = 5), Category = "Exports")
df_imports <- as.data.frame(arima_forecast_imports) %>% mutate(Year = seq(max(imports_data$Year) + 1, by = 1, length.out = 5), Category = "Imports")
df_total_trade <- as.data.frame(arima_forecast_total_trade) %>% mutate(Year = seq(max(total_trade_data$Year) + 1, by = 1, length.out = 5), Category = "Total Trade")
# Combine all forecasts
df_forecast <- bind_rows(df_exports, df_imports, df_total_trade)
# Create the forecast plot
ggplot(df_forecast, aes(x = Year, y = `Point Forecast`, color = Category)) +
geom_line(linewidth = 1.5) + # Use linewidth instead of size (ggplot2 3.4.0+)
scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
ggtitle("ARIMA Forecast for Exports, Imports, and Total Trade") +
ylab("Forecasted Value (in Millions)") + xlab("Year") +
theme_minimal()We will also be testing the accuracy of the forecast model to evaluate its performance and reliability.
Standard market practices for evaluating forecast accuracy rely on key metrics such as MAPE, RMSE, MAE, and ACF1.
Based on these measures, the imports forecast is highly accurate (MAPE ~4.84%), while exports and total trade forecasts fall within the “good” range (MAPE ~6.10% and ~5.23%).
The absence of significant autocorrelation in residuals suggests the models effectively capture trends.
The high RMSE values, particularly for total trade, indicate potential reliability issues due to large fluctuations in data.
### Forecast Accuracy for Exports ###
ME RMSE MAE MPE MAPE MASE
Training set 5842.817 21775.54 13733.86 2.777663 6.099909 0.6402954
Test set -215550.380 217655.84 215550.38 -53.417606 53.417606 10.0493182
ACF1
Training set -0.09397345
Test set NA
### Forecast Accuracy for Imports ###
ME RMSE MAE MPE MAPE MASE
Training set 3887.432 15372.93 10045.78 1.659329 4.840996 0.5509206
Test set -182952.628 184188.44 182952.63 -49.695067 49.695067 10.0333023
ACF1
Training set 0.03707208
Test set NA
### Forecast Accuracy for Total Trade ###
ME RMSE MAE MPE MAPE MASE
Training set 9537.844 35750.43 23074.03 2.170273 5.229894 0.5884102
Test set -399957.217 403059.64 399957.22 -51.756773 51.756773 10.1992978
ACF1
Training set -0.02527777
Test set NA
# Compute and print accuracy
cat("### Forecast Accuracy for Exports ###\n")
print(forecast::accuracy(arima_forecast_exports, test_exports))
# Print forecast accuracy for imports
cat("\n### Forecast Accuracy for Imports ###\n")
print(forecast::accuracy(arima_forecast_imports, test_imports))
# Print forecast accuracy for total trade
cat("\n### Forecast Accuracy for Total Trade ###\n")
print(forecast::accuracy(arima_forecast_total_trade, test_total_trade))The original visualizations analyzed in this analysis are sourced from here.


In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) clarity, and (2) visual appeal:
| Pros | Cons | Suggested fixes |
|---|---|---|
| Clearly labeled service categories with values and percentages – Each category is distinctly labeled, making it easy to understand the contributions of each service type. | Pie charts are not effective for comparing data slices – When multiple categories have close values, it is difficult to differentiate them accurately. | Replace pie charts with bar charts or grouped bar charts - for clearer comparison of service categories. |
| Percentage values enhance clarity – Helps users quickly understand each category’s contribution to total exports/ imports. | Pie charts are only useful when there are limited categories – Too many slices make it difficult to interpret, as smaller sections become unreadable. | Replace with a bar chart or grouped bar chart to improve readability. |
| The pie chart format makes trend comparison difficult – Pie charts only represent a single year and do not show how exports have changed over time. | Add a trend chart (e.g., line graph or bar chart) – Including historical data can help identify patterns and trends over multiple years. |
| Pros | Cons | Suggested fixes |
|---|---|---|
| Good use of icons to represent different services – This makes the chart more engaging and helps with intuitive understanding. | Pie charts distort perception – it’s hard to accurately compare slice sizes, especially when they are similar. | Use a bar chart, grouped bar chart or treemaps, which allows for more proportional representation of each service category. |
| Well-structured layout with categories spread around the chart - The service categories for both exports and imports maintain a similar placement | Pie chart format is too cluttered – The large number of slices makes it difficult to read smaller segments and their labels. | Use a bar chart, grouped bar chart or treemaps for better spacing and spatial representation. |
| Too many elements are packed into the chart, making it overwhelming. | Simplify by removing non-essential elements - i.e: icons | |
| No legend for color segmentation – The chart uses various colors for different service categories, but there is no clear legend explaining their grouping or significance. | Include a legend or categorize colors meaningfully – Assign gradient based color scale to visually emphasize trade volume intensity. |
After importing the trade_services data set, we will filter for the key categories that contribute to the exports/ imports of services in Singapore, excluding subcategories. The 12 major categories are identified based on indentation, as shown below:
| Data Series | Shortened label |
|---|---|
| Manufacturing Services On Physical Inputs Owned By Others | Manuf. Services |
| Maintenance And Repair Services | Maintenance & Repair |
| Transport | Transport |
| Travel | Travel |
| Insurance | Insurance |
| Government Goods And Services | Govt. Services |
| Construction | Construction |
| Financial | Financial Services |
| Telecommunications, Computer & Information | Telecom & IT |
| Charges For The Use Of Intellectual Property | Intellectual Property |
| Personal, Cultural And Recreational | Cultural & Recreational |
| Other Business Services | Other Biz Services |
We first clean the “Data Series” column by removing leading and trailing spaces to ensure consistency. Next, we identify the positions of “Exports Of Services” and “Imports Of Services” in the dataset, extracting only the rows between these markers for exports and those following the imports marker for imports. We then filter both datasets to retain only the 12 major service categories as hghlighted above.
Finally, we combine the cleaned exports and imports data into a single structured table, adding a “Trade Type” column to differentiate between imports/ exports.
# Trim leading/trailing spaces from 'Data Series' column
trade_services$`Data Series` <- trimws(trade_services$`Data Series`)
# Identify row indices for "Exports Of Services" and "Imports Of Services"
export_start_idx <- which(trade_services$`Data Series` == "Exports Of Services")
import_start_idx <- which(trade_services$`Data Series` == "Imports Of Services")
# Extract rows between "Exports Of Services" and "Imports Of Services"
exports_df <- trade_services[(export_start_idx + 1):(import_start_idx - 1), ]
imports_df <- trade_services[(import_start_idx + 1):nrow(trade_services), ]
# Define the 12 major categories
major_categories <- c(
"Manufacturing Services On Physical Inputs Owned By Others",
"Maintenance And Repair Services",
"Transport",
"Travel",
"Insurance",
"Government Goods And Services",
"Construction",
"Financial",
"Telecommunications, Computer & Information",
"Charges For The Use Of Intellectual Property",
"Personal, Cultural And Recreational",
"Other Business Services"
)
# Filter only the major categories for exports and imports
exports_major <- exports_df %>% filter(`Data Series` %in% major_categories)
imports_major <- imports_df %>% filter(`Data Series` %in% major_categories)
# Add a column to indicate whether it's exports or imports
exports_major$`Trade Type` <- "Exports"
imports_major$`Trade Type` <- "Imports"
# Combine into a single dataframe
final_trade_df <- bind_rows(exports_major, imports_major)
# Reorder columns: Move "Trade Type" to the second column
final_trade_df <- final_trade_df %>% select(`Data Series`, `Trade Type`, everything())
# View the final structured data
print(final_trade_df)# A tibble: 24 × 27
`Data Series` `Trade Type` `2024` `2023` `2022` `2021` `2020` `2019` `2018`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Manufacturing … Exports 4.71e2 4.91e2 6.85e2 4.90e2 237. 267. 370.
2 Maintenance An… Exports 1.08e4 1.10e4 1.01e4 8.83e3 8173. 9663. 9410
3 Transport Exports 1.73e5 1.50e5 1.90e5 1.44e5 93560 94272. 88889.
4 Travel Exports 3.19e4 2.79e4 1.57e4 5.40e3 7527. 27755. 27367.
5 Insurance Exports 1.26e4 1.11e4 1.15e4 9.13e3 8438 8817. 9485.
6 Government Goo… Exports 4.77e2 4.64e2 4.44e2 4.22e2 412. 419. 414.
7 Construction Exports 2.10e3 2.04e3 1.62e3 1.20e3 1272. 1742. 1642.
8 Financial Exports 7.16e4 6.53e4 5.61e4 5.31e4 49688. 45944. 42840.
9 Telecommunicat… Exports 4.11e4 3.93e4 3.42e4 3.19e4 26826. 20252. 20533.
10 Charges For Th… Exports 2.63e4 2.42e4 1.79e4 1.64e4 12866. 12297. 12084.
# ℹ 14 more rows
# ℹ 18 more variables: `2017` <dbl>, `2016` <dbl>, `2015` <dbl>, `2014` <dbl>,
# `2013` <dbl>, `2012` <dbl>, `2011` <dbl>, `2010` <dbl>, `2009` <dbl>,
# `2008` <dbl>, `2007` <dbl>, `2006` <dbl>, `2005` <dbl>, `2004` <dbl>,
# `2003` <dbl>, `2002` <dbl>, `2001` <dbl>, `2000` <dbl>
Key makeover changes:
1️⃣ Single chart for imports and exports:
2️⃣ Improved readability and comparability:
Grouped bars (red for exports, blue for imports) –> Offer a clearer visual distinction between trade types.
Sorted categories (largest to smallest) –> Focus attention on key contributors, ensuring the most impactful sectors are easily identified.
3️⃣ Simplified layout:
4️⃣ Interactive insights:
# Load required libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(plotly) # For interactive tooltips
# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
select(`Data Series`, `Trade Type`, `2024`) %>%
rename(Value = `2024`)
# Shorten long category labels for better readability
trade_2024$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Financial", "Financial Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Other Business Services", "Other Biz Services", trade_2024$`Data Series`)
# Compute Total Exports & Imports for Percentage Calculation
total_exports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Exports"], na.rm = TRUE)
total_imports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Imports"], na.rm = TRUE)
# Add Percentage Column
trade_2024 <- trade_2024 %>%
mutate(Percentage = ifelse(`Trade Type` == "Exports",
Value / total_exports * 100,
Value / total_imports * 100))
# Sum values for sorting (descending order)
category_order <- trade_2024 %>%
group_by(`Data Series`) %>%
summarise(Total_Trade = sum(Value, na.rm = TRUE)) %>%
arrange(desc(Total_Trade)) %>%
pull(`Data Series`)
# Convert `Data Series` to factor for correct sorting
trade_2024$`Data Series` <- factor(trade_2024$`Data Series`, levels = category_order)
# Define custom colors for Exports (Red) and Imports (Blue)
custom_colors <- c("Exports" = "red", "Imports" = "blue")
# Create the ggplot object
gg <- ggplot(trade_2024, aes(x = `Data Series`, y = Value, fill = `Trade Type`,
text = paste("Category:", `Data Series`,
"<br>Trade Type:", `Trade Type`,
"<br>Value: S$", scales::comma(Value),
"<br>Share:", round(Percentage, 1), "%"))) +
geom_bar(stat = "identity", position = position_dodge(width = 0.8)) + # Side-by-side bars
scale_fill_manual(values = custom_colors) + # Apply custom colors
scale_y_continuous(labels = scales::comma) + # Format y-axis labels
labs(title = "Singapore Trade in Services - 2024",
subtitle = "Side-by-Side Comparison of Imports and Exports for 12 Service Categories",
x = "Service Category",
y = "Trade Value",
fill = "Trade Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x-axis labels
legend.position = "bottom")
# Convert to interactive plot with tooltips
ggplotly(gg, tooltip = "text")Key makeover changes:
1️⃣ Transition from Pie charts to Treemaps
2️⃣ Added plotly inteactive features:
3️⃣ Gradient-based color scale for trade value:
# Load required libraries
library(dplyr)
library(tidyr)
library(treemap)
library(plotly)
# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
select(`Data Series`, `Trade Type`, `2024`) %>%
rename(Value = `2024`)
# Shorten long category labels for better readability
trade_2024 <- trade_2024 %>%
mutate(Short_Label = case_when(
`Data Series` == "Manufacturing Services On Physical Inputs Owned By Others" ~ "Manuf. Services",
`Data Series` == "Maintenance And Repair Services" ~ "Maintenance & Repair",
`Data Series` == "Government Goods And Services" ~ "Govt. Services",
`Data Series` == "Financial" ~ "Financial Services",
`Data Series` == "Telecommunications, Computer & Information" ~ "Telecom & IT",
`Data Series` == "Charges For The Use Of Intellectual Property" ~ "Intellectual Property",
`Data Series` == "Personal, Cultural And Recreational" ~ "Cultural & Recreational",
`Data Series` == "Other Business Services" ~ "Other Biz Services",
TRUE ~ `Data Series`
))
# Split the dataset into Exports and Imports
trade_exports <- trade_2024 %>% filter(`Trade Type` == "Exports")
trade_imports <- trade_2024 %>% filter(`Trade Type` == "Imports")
# Create Treemap Data for Exports
treemap_exports <- treemap(trade_exports,
index = "Short_Label",
vSize = "Value",
type = "index",
palette = "Blues",
title = "Exports Treemap - 2024",
draw = FALSE)
# Create Treemap Data for Imports
treemap_imports <- treemap(trade_imports,
index = "Short_Label",
vSize = "Value",
type = "index",
palette = "Oranges",
title = "Imports Treemap - 2024",
draw = FALSE)
# Convert Exports Treemap Data to Plotly (Darker Green for Higher Values)
p1_interactive <- plot_ly(
data = treemap_exports$tm,
labels = ~Short_Label,
parents = "",
values = ~vSize,
text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
type = "treemap",
textinfo = "label+text",
marker = list(
colorscale = list(c(0, 1), c("#d9f2d9", "#006400")), # Light to Dark Green
cmin = min(treemap_exports$tm$vSize),
cmax = max(treemap_exports$tm$vSize),
colorbar = list(title = "Trade Value")
)
) %>%
layout(title = "Exports Treemap - 2024")
# Convert Imports Treemap Data to Plotly (Darker Green for Higher Values)
p2_interactive <- plot_ly(
data = treemap_imports$tm,
labels = ~Short_Label,
parents = "",
values = ~vSize,
text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
type = "treemap",
textinfo = "label+text",
marker = list(
colorscale = list(c(0, 1), c("#d9f2d9", "#006400")), # Light to Dark Green
cmin = min(treemap_imports$tm$vSize),
cmax = max(treemap_imports$tm$vSize),
colorbar = list(title = "Trade Value")
)
) %>%
layout(title = "Imports Treemap - 2024")
p1_interactive
p2_interactiveKey observations:
Rapid Growth: Singapore’s service exports have grown significantly, reinforcing its role as a global trade hub.
Leading Sectors: Transport and Other Business Services dominate, showing the steepest growth.
Financial & Digital Expansion: Financial Services, Telecom & IT, and Intellectual Property have surged, driven by digitalization and economic policies.
Resilience & Acceleration: Despite slower growth around 2012 (likely due to economic shocks), post-2018 expansion highlights policy-driven trade growth.

#| fig-width: 8
#| fig-height: 6
# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)
# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))
# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)
# Filter for Exports only
exports_df <- final_trade_df %>% filter(`Trade Type` == "Exports")
# Select relevant columns: Data Series, 2000, 2012, 2024
exports_slope <- exports_df %>%
select(`Data Series`, `2000`, `2012`, `2024`) %>%
drop_na()
# Keep only the top 10 services in 2024 for clarity
exports_slope <- exports_slope %>% arrange(desc(`2024`)) %>% head(10)
# Convert data to long format for `newggslopegraph()`
exports_long <- exports_slope %>%
pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")
# Convert Year to character (Fix for newggslopegraph)
exports_long$Year <- as.character(exports_long$Year)
# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)
# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = exports_long,
Times = Year,
Measurement = Value,
Grouping = `Data Series`,
Title = "Trade in Services (Exports) - Slopegraph (2000-2012-2024)",
SubTitle = "Top 10 Services by Export Value",
LineThickness = 1.0,
DataTextSize = 2.5)Key observations:
Strong Growth in Imports: Similar to exports, Singapore’s service imports have grown significantly, reflecting its global trade connectivity.
Transport & Other Business Services Lead: These two sectors dominate imports, mirroring export trends, but at slightly lower values.
Rising Financial Services & Telecom & IT: These sectors have seen notable import growth, indicating increasing reliance on foreign expertise and digital services.
Higher Import Dependency in Intellectual Property: Compared to exports, imports in Intellectual Property have increased faster, suggesting rising licensing and royalty payments.

#| fig-width: 8
#| fig-height: 6
# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)
# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))
# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)
# Filter for Imports only
imports_df <- final_trade_df %>% filter(`Trade Type` == "Imports")
# Select relevant columns: Data Series, 2000, 2012, 2024
imports_slope <- imports_df %>%
select(`Data Series`, `2000`, `2012`, `2024`) %>%
drop_na()
# Keep only the top 10 services in 2024 for clarity
imports_slope <- imports_slope %>% arrange(desc(`2024`)) %>% head(10)
# Convert data to long format for `newggslopegraph()`
imports_long <- imports_slope %>%
pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")
# Convert Year to character (Fix for newggslopegraph)
imports_long$Year <- as.character(imports_long$Year)
# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)
# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = imports_long,
Times = Year,
Measurement = Value,
Grouping = `Data Series`,
Title = "Trade in Services (Imports) - Slopegraph (2000-2012-2024)",
SubTitle = "Top 10 Services by Import Value",
LineThickness = 1.0,
DataTextSize = 2.5)The original visualizations analyzed in this analysis are sourced from here.


In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) clarity, and (2) visual appeal:
| Pros | Cons | Suggested fixes |
|---|---|---|
| Clear regional differentiation - The color-coded regions (North America, Europe, Asia, Oceania) make it easier to distinguish areas. | Color scheme inconsistency - The colors used for regions in the world map do not match the second visualization, leading to confusion. | Ensure consistent colors across both graphics (e.g., North America should be red in both images). |
| Easy country identification - The use of country flags provides quick recognition of major trading partners on the world map. | Lack of legend for flags - It is unclear whether the flags represent top import/export partners or something else. | Include a small legend explaining the significance of the flags. |
| Well-placed title - The main title is clear and ensures users immediately understand the topic. | World map does not indicate rankings - It is difficult to determine the largest trading partners. | Add ranking indicators - e.g.: numbered markers, or a gradient effect on the map. |
| Need to scroll to see rankings - Users must scroll down to understand that U.S is the top trading partner, followed by EU, and China. | Overlay rankings on the world map so users can see top trading partners at a glance. | |
| Lack of clear distinction between Regions, Countries, and Economic & Political Unions - The visualization presents them together, making it difficult to see how rankings differ. | Introduce clear labels or grouping for Regions, Countries, and Economic & Political Unions in both the world map and bar chart. |
| Pros | Cons | Suggested fixes |
|---|---|---|
| Engaging world map design - The dotted world map with vibrant colors creates a modern and appealing look. | Color inconsistency between the two visuals - North America is red in the map but blue in the bar chart. | Use a unified color scheme so regions in both visuals match. |
The code chunk below imports the (1) Exports of Services by Major Trading Partner datasets, and (2) Imports of Services by Major Trading Partner datasets downloaded from the Department of Statistics Singapore (DOS), using the read_excel() function from the readxl package. These datasets contain trade value of exports and imports of services by major trading partner across various 2000 to 2023, which will be processed and analyzed in subsequent steps.
export_partners data structureglimpse(): provides a transposed overview of a dataset, showing variables and their types in a concise format.head(): displays the first few rows of a dataset (default is 6 rows) to give a quick preview of the data.summary(): generates a statistical summary of each variable, including measures like mean, median, and range for numeric data.duplicated():returns a logical vector indicating which elements or rows in a vector or data frame are duplicates.colSums(is.na()): counts the number of missing values (NA) in each column of the data frame.str(): use str() to display the column names, data types, and a preview of the data.Rows: 68
Columns: 25
$ `Data Series` <chr> "Asia", "Bangladesh", "Brunei Darussalam", "Cambodia", "…
$ `2023` <dbl> 170787.7, 985.7, 960.1, 475.6, 21686.1, 9909.4, 8542.6, …
$ `2022` <dbl> 174243.3, 903.9, 891.2, 389.5, 21307.3, 9730.1, 8366.6, …
$ `2021` <dbl> 147125.9, 674.8, 579.0, 256.0, 18126.2, 7817.0, 6222.5, …
$ `2020` <dbl> 115989.3, 581.6, 533.4, 250.3, 15063.2, 6337.6, 5351.4, …
$ `2019` <dbl> 114573.1, 642.4, 556.4, 259.1, 11273.2, 6383.4, 6781.8, …
$ `2018` <dbl> 104182.5, 664.0, 495.9, 290.4, 10466.3, 5957.8, 5895.2, …
$ `2017` <dbl> 85093.3, 547.7, 463.9, 226.8, 8239.5, 5326.3, 5615.7, 25…
$ `2016` <dbl> 70461.8, 482.6, 420.5, 196.1, 7351.8, 4422.4, 5088.6, 23…
$ `2015` <dbl> 69881.1, 424.3, 465.4, 114.3, 7260.0, 4236.4, 5032.0, 17…
$ `2014` <dbl> 62802.7, 384.3, 520.4, 86.3, 5841.4, 4276.0, 4837.1, 159…
$ `2013` <dbl> 55949.9, 347.5, 474.4, 89.2, 5184.6, 4857.4, 4535.8, 155…
$ `2012` <dbl> 50496.3, 349.1, 464.1, 115.0, 5156.3, 4375.0, 4062.7, 14…
$ `2011` <dbl> 48318.2, 333.5, 468.5, 109.2, 5104.8, 4244.7, 3537.8, 13…
$ `2010` <dbl> 46254.5, 272.6, 508.1, 78.4, 4928.8, 4089.3, 3291.1, 114…
$ `2009` <chr> "40325.300000000003", "209.3", "324.39999999999998", "99…
$ `2008` <dbl> 43935.5, 264.8, 272.6, 139.7, 4676.6, 3502.5, 3321.9, 93…
$ `2007` <dbl> 39789.5, 284.0, 248.3, 88.1, 4429.7, 3191.3, 3026.0, 136…
$ `2006` <dbl> 33098.0, 285.5, 120.6, 110.8, 3555.3, 2448.5, 2969.8, 10…
$ `2005` <chr> "28070.400000000001", "274.60000000000002", "123.3", "77…
$ `2004` <chr> "25457.1", "188.9", "138.6", "68.7", "2313.6999999999998…
$ `2003` <chr> "18942", "157", "151.4", "45.3", "2059.6999999999998", "…
$ `2002` <chr> "17741.7", "118.3", "144.30000000000001", "37.9", "1912"…
$ `2001` <chr> "16226.5", "121.8", "97.8", "29.8", "1866.5", "735.4", "…
$ `2000` <chr> "15816.9", "89.7", "100.5", "18.399999999999999", "1860.…
# A tibble: 6 × 25
`Data Series` `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016` `2015`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Asia 1.71e5 1.74e5 1.47e5 1.16e5 1.15e5 1.04e5 85093. 70462. 69881.
2 Bangladesh 9.86e2 9.04e2 6.75e2 5.82e2 6.42e2 6.64e2 548. 483. 424.
3 Brunei Darussa… 9.60e2 8.91e2 5.79e2 5.33e2 5.56e2 4.96e2 464. 420. 465.
4 Cambodia 4.76e2 3.89e2 2.56e2 2.50e2 2.59e2 2.90e2 227. 196. 114.
5 Hong Kong 2.17e4 2.13e4 1.81e4 1.51e4 1.13e4 1.05e4 8240. 7352. 7260
6 India 9.91e3 9.73e3 7.82e3 6.34e3 6.38e3 5.96e3 5326. 4422. 4236.
# ℹ 15 more variables: `2014` <dbl>, `2013` <dbl>, `2012` <dbl>, `2011` <dbl>,
# `2010` <dbl>, `2009` <chr>, `2008` <dbl>, `2007` <dbl>, `2006` <dbl>,
# `2005` <chr>, `2004` <chr>, `2003` <chr>, `2002` <chr>, `2001` <chr>,
# `2000` <chr>
Data Series 2023 2022 2021
Length:68 Min. : 296.0 Min. : 169.8 Min. : 191.5
Class :character 1st Qu.: 789.3 1st Qu.: 685.9 1st Qu.: 560.1
Mode :character Median : 3133.4 Median : 3536.1 Median : 2923.7
Mean : 13585.2 Mean : 13476.6 Mean : 11404.5
3rd Qu.: 9860.0 3rd Qu.: 9770.0 3rd Qu.: 8172.4
Max. :170787.7 Max. :174243.3 Max. :147125.9
2020 2019 2018 2017
Min. : 104.1 Min. : 93.3 Min. : 85.0 Min. : 67.9
1st Qu.: 430.4 1st Qu.: 461.6 1st Qu.: 434.7 1st Qu.: 351.5
Median : 2219.9 Median : 1751.8 Median : 1659.9 Median : 1156.8
Mean : 8714.4 Mean : 8282.8 Mean : 7774.1 Mean : 6370.7
3rd Qu.: 6733.2 3rd Qu.: 6483.0 3rd Qu.: 5910.9 3rd Qu.: 5398.6
Max. :115989.3 Max. :114573.1 Max. :104182.5 Max. :85093.3
2016 2015 2014 2013
Min. : 72.0 Min. : 72.1 Min. : 61.5 Min. : 66.5
1st Qu.: 295.7 1st Qu.: 370.0 1st Qu.: 311.5 1st Qu.: 307.6
Median : 1173.2 Median : 1549.8 Median : 1377.8 Median : 1177.2
Mean : 5541.1 Mean : 5569.5 Mean : 5011.7 Mean : 4548.1
3rd Qu.: 4588.9 3rd Qu.: 5121.5 3rd Qu.: 4416.3 3rd Qu.: 4545.9
Max. :70461.8 Max. :69881.1 Max. :62802.7 Max. :55949.9
2012 2011 2010 2009
Min. : 77.3 Min. : 33.1 Min. : 17.5 Length:68
1st Qu.: 279.5 1st Qu.: 233.1 1st Qu.: 262.3 Class :character
Median : 1008.1 Median : 1023.5 Median : 769.5 Mode :character
Mean : 4081.9 Mean : 3751.8 Mean : 3455.8
3rd Qu.: 3777.1 3rd Qu.: 3442.4 3rd Qu.: 3007.9
Max. :50496.3 Max. :48318.2 Max. :46254.5
2008 2007 2006 2005
Min. : 5.4 Min. : 1.8 Min. : 1.6 Length:68
1st Qu.: 204.1 1st Qu.: 179.8 1st Qu.: 146.8 Class :character
Median : 683.1 Median : 534.7 Median : 441.1 Mode :character
Mean : 3270.8 Mean : 2834.3 Mean : 2384.7
3rd Qu.: 2840.8 3rd Qu.: 2727.7 3rd Qu.: 2028.2
Max. :43935.5 Max. :39789.5 Max. :33098.0
2004 2003 2002 2001
Length:68 Length:68 Length:68 Length:68
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
2000
Length:68
Class :character
Mode :character
# A tibble: 0 × 25
# ℹ 25 variables: Data Series <chr>, 2023 <dbl>, 2022 <dbl>, 2021 <dbl>,
# 2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <dbl>, 2015 <dbl>,
# 2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>, 2009 <chr>,
# 2008 <dbl>, 2007 <dbl>, 2006 <dbl>, 2005 <chr>, 2004 <chr>, 2003 <chr>,
# 2002 <chr>, 2001 <chr>, 2000 <chr>
Data Series 2023 2022 2021 2020 2019
0 0 0 0 0 0
2018 2017 2016 2015 2014 2013
0 0 0 0 0 0
2012 2011 2010 2009 2008 2007
0 0 0 0 0 0
2006 2005 2004 2003 2002 2001
0 0 0 0 0 0
2000
0
replace_na() function to replace missing values with 0 in specified columns.tibble [68 × 25] (S3: tbl_df/tbl/data.frame)
$ Data Series: chr [1:68] "Asia" "Bangladesh" "Brunei Darussalam" "Cambodia" ...
$ 2023 : num [1:68] 170788 986 960 476 21686 ...
$ 2022 : num [1:68] 174243 904 891 390 21307 ...
$ 2021 : num [1:68] 147126 675 579 256 18126 ...
$ 2020 : num [1:68] 115989 582 533 250 15063 ...
$ 2019 : num [1:68] 114573 642 556 259 11273 ...
$ 2018 : num [1:68] 104183 664 496 290 10466 ...
$ 2017 : num [1:68] 85093 548 464 227 8240 ...
$ 2016 : num [1:68] 70462 483 420 196 7352 ...
$ 2015 : num [1:68] 69881 424 465 114 7260 ...
$ 2014 : num [1:68] 62802.7 384.3 520.4 86.3 5841.4 ...
$ 2013 : num [1:68] 55949.9 347.5 474.4 89.2 5184.6 ...
$ 2012 : num [1:68] 50496 349 464 115 5156 ...
$ 2011 : num [1:68] 48318 334 468 109 5105 ...
$ 2010 : num [1:68] 46254.5 272.6 508.1 78.4 4928.8 ...
$ 2009 : chr [1:68] "40325.300000000003" "209.3" "324.39999999999998" "99.3" ...
$ 2008 : num [1:68] 43936 265 273 140 4677 ...
$ 2007 : num [1:68] 39789.5 284 248.3 88.1 4429.7 ...
$ 2006 : num [1:68] 33098 286 121 111 3555 ...
$ 2005 : chr [1:68] "28070.400000000001" "274.60000000000002" "123.3" "77.099999999999994" ...
$ 2004 : chr [1:68] "25457.1" "188.9" "138.6" "68.7" ...
$ 2003 : chr [1:68] "18942" "157" "151.4" "45.3" ...
$ 2002 : chr [1:68] "17741.7" "118.3" "144.30000000000001" "37.9" ...
$ 2001 : chr [1:68] "16226.5" "121.8" "97.8" "29.8" ...
$ 2000 : chr [1:68] "15816.9" "89.7" "100.5" "18.399999999999999" ...
import_partners data structureRows: 68
Columns: 25
$ `Data Series` <chr> "Asia", "Bangladesh", "Brunei Darussalam", "Cambodia", "…
$ `2023` <dbl> 139497.1, 468.8, 196.8, 121.2, 20255.7, 13532.6, 3517.7,…
$ `2022` <dbl> 136729.5, 599.1, 83.5, 133.6, 19762.7, 12913.9, 3632.4, …
$ `2021` <dbl> 115022.1, 441.8, 53.6, 150.8, 20830.3, 10462.6, 3358.9, …
$ `2020` <dbl> 93351.0, 392.9, 69.7, 159.6, 14410.3, 8780.3, 2949.9, 22…
$ `2019` <dbl> 83242.3, 380.9, 49.2, 217.9, 13093.5, 7813.9, 2794.3, 23…
$ `2018` <dbl> 78004.9, 318.7, 82.2, 116.5, 12234.9, 6612.3, 2494.2, 28…
$ `2017` <dbl> 65654.5, 251.9, 73.5, 112.7, 10874.2, 5461.9, 2245.5, 32…
$ `2016` <chr> "55208.6", "233.9", "96.6", "92.6", "8845.4", "4549.6000…
$ `2015` <dbl> 52852.1, 192.7, 67.1, 82.1, 8664.1, 4035.6, 2170.2, 215.…
$ `2014` <dbl> 48565.4, 234.8, 53.1, 65.8, 7674.3, 3839.5, 1990.2, 148.…
$ `2013` <dbl> 40683.5, 176.8, 48.9, 63.5, 5096.7, 3555.7, 1920.0, 138.…
$ `2012` <dbl> 36390.3, 160.0, 58.3, 127.8, 4233.6, 3039.8, 1816.8, 113…
$ `2011` <dbl> 34902.8, 149.4, 41.3, 133.2, 3944.7, 2979.5, 1733.9, 74.…
$ `2010` <dbl> 31712.3, 141.0, 36.6, 110.7, 4279.6, 2700.1, 1780.2, 99.…
$ `2009` <dbl> 27360.1, 106.5, 40.2, 102.9, 3503.1, 2352.3, 1566.5, 62.…
$ `2008` <dbl> 28848.5, 82.4, 109.6, 112.4, 3623.9, 2345.5, 1673.7, 60.…
$ `2007` <dbl> 24679.0, 79.9, 101.2, 100.2, 3520.0, 1869.5, 1587.4, 47.…
$ `2006` <chr> "22188.6", "105.7", "49.6", "205.6", "2938.8", "1512.1",…
$ `2005` <chr> "19161.400000000001", "91.2", "36.299999999999997", "69.…
$ `2004` <chr> "18110.8", "82.2", "60.6", "58", "2222.6999999999998", "…
$ `2003` <chr> "12607.1", "70.900000000000006", "49.3", "34.5", "1685.2…
$ `2002` <chr> "10987.5", "77.5", "37.799999999999997", "30", "1585.9",…
$ `2001` <chr> "9702.7000000000007", "39.700000000000003", "29.4", "20.…
$ `2000` <chr> "8639.7999999999993", "38.799999999999997", "32.6", "9",…
# A tibble: 6 × 25
`Data Series` `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016` `2015`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 Asia 1.39e5 1.37e5 1.15e5 9.34e4 8.32e4 7.80e4 6.57e4 55208… 5.29e4
2 Bangladesh 4.69e2 5.99e2 4.42e2 3.93e2 3.81e2 3.19e2 2.52e2 233.9 1.93e2
3 Brunei Darussa… 1.97e2 8.35e1 5.36e1 6.97e1 4.92e1 8.22e1 7.35e1 96.6 6.71e1
4 Cambodia 1.21e2 1.34e2 1.51e2 1.60e2 2.18e2 1.16e2 1.13e2 92.6 8.21e1
5 Hong Kong 2.03e4 1.98e4 2.08e4 1.44e4 1.31e4 1.22e4 1.09e4 8845.4 8.66e3
6 India 1.35e4 1.29e4 1.05e4 8.78e3 7.81e3 6.61e3 5.46e3 4549.… 4.04e3
# ℹ 15 more variables: `2014` <dbl>, `2013` <dbl>, `2012` <dbl>, `2011` <dbl>,
# `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>, `2006` <chr>,
# `2005` <chr>, `2004` <chr>, `2003` <chr>, `2002` <chr>, `2001` <chr>,
# `2000` <chr>
Data Series 2023 2022 2021
Length:68 Min. : 93.1 Min. : 83.5 Min. : 53.6
Class :character 1st Qu.: 608.7 1st Qu.: 565.6 1st Qu.: 448.8
Mode :character Median : 2333.1 Median : 2153.3 Median : 1762.8
Mean : 11464.3 Mean : 10862.6 Mean : 9301.5
3rd Qu.: 8176.1 3rd Qu.: 8600.9 3rd Qu.: 6637.0
Max. :139497.1 Max. :136729.5 Max. :115022.1
2020 2019 2018 2017
Min. : 69.7 Min. : 49.2 Min. : 82.2 Min. : 73.5
1st Qu.: 344.2 1st Qu.: 309.6 1st Qu.: 310.2 1st Qu.: 274.6
Median : 1669.7 Median : 1556.3 Median : 1565.8 Median : 1411.4
Mean : 7982.5 Mean : 7263.2 Mean : 7045.8 Mean : 6413.4
3rd Qu.: 5550.0 3rd Qu.: 5604.6 3rd Qu.: 5620.9 3rd Qu.: 6136.3
Max. :93351.0 Max. :83242.3 Max. :78004.9 Max. :65654.5
2016 2015 2014 2013
Length:68 Min. : 34.3 Min. : 40.3 Min. : 48.9
Class :character 1st Qu.: 217.2 1st Qu.: 250.5 1st Qu.: 192.3
Mode :character Median : 1142.8 Median : 1000.9 Median : 901.6
Mean : 5567.2 Mean : 5337.7 Mean : 4623.7
3rd Qu.: 4324.4 3rd Qu.: 5590.8 3rd Qu.: 5151.0
Max. :52852.1 Max. :50558.2 Max. :44479.7
2012 2011 2010 2009
Min. : 43.5 Min. : 27.2 Min. : 4.7 Min. : 4.50
1st Qu.: 160.7 1st Qu.: 141.7 1st Qu.: 109.5 1st Qu.: 98.22
Median : 628.3 Median : 606.6 Median : 438.5 Median : 380.20
Mean : 4006.5 Mean : 3466.0 Mean : 3164.2 Mean : 2726.97
3rd Qu.: 3773.5 3rd Qu.: 3178.5 3rd Qu.: 2564.5 3rd Qu.: 2279.35
Max. :36390.3 Max. :34902.8 Max. :31712.3 Max. :27360.10
2008 2007 2006 2005
Min. : 4.50 Min. : 8.90 Length:68 Length:68
1st Qu.: 99.97 1st Qu.: 85.83 Class :character Class :character
Median : 407.45 Median : 280.55 Mode :character Mode :character
Mean : 2769.65 Mean : 2348.02
3rd Qu.: 2253.03 3rd Qu.: 1720.08
Max. :28848.50 Max. :24679.00
2004 2003 2002 2001
Length:68 Length:68 Length:68 Length:68
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
2000
Length:68
Class :character
Mode :character
# A tibble: 0 × 25
# ℹ 25 variables: Data Series <chr>, 2023 <dbl>, 2022 <dbl>, 2021 <dbl>,
# 2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <chr>, 2015 <dbl>,
# 2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>, 2009 <dbl>,
# 2008 <dbl>, 2007 <dbl>, 2006 <chr>, 2005 <chr>, 2004 <chr>, 2003 <chr>,
# 2002 <chr>, 2001 <chr>, 2000 <chr>
Data Series 2023 2022 2021 2020 2019
0 0 0 0 0 0
2018 2017 2016 2015 2014 2013
0 0 0 0 0 0
2012 2011 2010 2009 2008 2007
0 0 0 0 0 0
2006 2005 2004 2003 2002 2001
0 0 0 0 0 0
2000
0
replace_na() function to replace missing values with 0 in specified columns.tibble [68 × 25] (S3: tbl_df/tbl/data.frame)
$ Data Series: chr [1:68] "Asia" "Bangladesh" "Brunei Darussalam" "Cambodia" ...
$ 2023 : num [1:68] 139497 469 197 121 20256 ...
$ 2022 : num [1:68] 136729.5 599.1 83.5 133.6 19762.7 ...
$ 2021 : num [1:68] 115022.1 441.8 53.6 150.8 20830.3 ...
$ 2020 : num [1:68] 93351 392.9 69.7 159.6 14410.3 ...
$ 2019 : num [1:68] 83242.3 380.9 49.2 217.9 13093.5 ...
$ 2018 : num [1:68] 78004.9 318.7 82.2 116.5 12234.9 ...
$ 2017 : num [1:68] 65654.5 251.9 73.5 112.7 10874.2 ...
$ 2016 : chr [1:68] "55208.6" "233.9" "96.6" "92.6" ...
$ 2015 : num [1:68] 52852.1 192.7 67.1 82.1 8664.1 ...
$ 2014 : num [1:68] 48565.4 234.8 53.1 65.8 7674.3 ...
$ 2013 : num [1:68] 40683.5 176.8 48.9 63.5 5096.7 ...
$ 2012 : num [1:68] 36390.3 160 58.3 127.8 4233.6 ...
$ 2011 : num [1:68] 34902.8 149.4 41.3 133.2 3944.7 ...
$ 2010 : num [1:68] 31712.3 141 36.6 110.7 4279.6 ...
$ 2009 : num [1:68] 27360.1 106.5 40.2 102.9 3503.1 ...
$ 2008 : num [1:68] 28848.5 82.4 109.6 112.4 3623.9 ...
$ 2007 : num [1:68] 24679 79.9 101.2 100.2 3520 ...
$ 2006 : chr [1:68] "22188.6" "105.7" "49.6" "205.6" ...
$ 2005 : chr [1:68] "19161.400000000001" "91.2" "36.299999999999997" "69.900000000000006" ...
$ 2004 : chr [1:68] "18110.8" "82.2" "60.6" "58" ...
$ 2003 : chr [1:68] "12607.1" "70.900000000000006" "49.3" "34.5" ...
$ 2002 : chr [1:68] "10987.5" "77.5" "37.799999999999997" "30" ...
$ 2001 : chr [1:68] "9702.7000000000007" "39.700000000000003" "29.4" "20.9" ...
$ 2000 : chr [1:68] "8639.7999999999993" "38.799999999999997" "32.6" "9" ...
export_partners and import_partnersTo ensure consistency, column names were converted to lowercase, and spaces were replaced with underscores. Since numerical values were stored as different data types in the datasets, all year columns (2000-2023) were converted to numeric format to prevent data type mismatches. Additionally, any missing (NA) values or dashes (“-”) were replaced with 0 to ensure data integrity. A “category” column was added to distinguish between Exports and Imports. Finally, the two datasets were combined using bind_rows().
# Ensure column names are consistent
colnames(export_partners) <- tolower(gsub(" ", "_", colnames(export_partners)))
colnames(import_partners) <- tolower(gsub(" ", "_", colnames(import_partners)))
# Convert all numeric year columns to numeric type safely by handling non-numeric values and replacing NA and "-" with 0
export_partners <- export_partners %>% mutate(across(matches("^\\d{4}$"), ~suppressWarnings(as.numeric(.)))) %>%
mutate(across(matches("^\\d{4}$"), ~replace(., is.na(.) | . == "-", 0)))
import_partners <- import_partners %>% mutate(across(matches("^\\d{4}$"), ~suppressWarnings(as.numeric(.)))) %>%
mutate(across(matches("^\\d{4}$"), ~replace(., is.na(.) | . == "-", 0)))
# Add a column to indicate the category (Exports, Imports)
export_partners <- export_partners %>% mutate(category = "Exports")
import_partners <- import_partners %>% mutate(category = "Imports")
# Combine both datasets
combined_data <- bind_rows(export_partners, import_partners)combined_data data structureThe combined_data tibble contains 26 attributes, as shown below.
The following preprocessing checks were conducted as part of data preparation:
combined_data dataset using glimpse() and str()duplicated() in the datasetcolSums(is.na())Rows: 136
Columns: 26
$ data_series <chr> "Asia", "Bangladesh", "Brunei Darussalam", "Cambodia", "Ho…
$ `2023` <dbl> 170787.7, 985.7, 960.1, 475.6, 21686.1, 9909.4, 8542.6, 81…
$ `2022` <dbl> 174243.3, 903.9, 891.2, 389.5, 21307.3, 9730.1, 8366.6, 58…
$ `2021` <dbl> 147125.9, 674.8, 579.0, 256.0, 18126.2, 7817.0, 6222.5, 50…
$ `2020` <dbl> 115989.3, 581.6, 533.4, 250.3, 15063.2, 6337.6, 5351.4, 35…
$ `2019` <dbl> 114573.1, 642.4, 556.4, 259.1, 11273.2, 6383.4, 6781.8, 25…
$ `2018` <dbl> 104182.5, 664.0, 495.9, 290.4, 10466.3, 5957.8, 5895.2, 24…
$ `2017` <dbl> 85093.3, 547.7, 463.9, 226.8, 8239.5, 5326.3, 5615.7, 254.…
$ `2016` <dbl> 70461.8, 482.6, 420.5, 196.1, 7351.8, 4422.4, 5088.6, 231.…
$ `2015` <dbl> 69881.1, 424.3, 465.4, 114.3, 7260.0, 4236.4, 5032.0, 175.…
$ `2014` <dbl> 62802.7, 384.3, 520.4, 86.3, 5841.4, 4276.0, 4837.1, 159.2…
$ `2013` <dbl> 55949.9, 347.5, 474.4, 89.2, 5184.6, 4857.4, 4535.8, 155.1…
$ `2012` <dbl> 50496.3, 349.1, 464.1, 115.0, 5156.3, 4375.0, 4062.7, 146.…
$ `2011` <dbl> 48318.2, 333.5, 468.5, 109.2, 5104.8, 4244.7, 3537.8, 132.…
$ `2010` <dbl> 46254.5, 272.6, 508.1, 78.4, 4928.8, 4089.3, 3291.1, 114.2…
$ `2009` <dbl> 40325.3, 209.3, 324.4, 99.3, 4344.5, 3712.2, 3642.6, 89.2,…
$ `2008` <dbl> 43935.5, 264.8, 272.6, 139.7, 4676.6, 3502.5, 3321.9, 93.3…
$ `2007` <dbl> 39789.5, 284.0, 248.3, 88.1, 4429.7, 3191.3, 3026.0, 136.5…
$ `2006` <dbl> 33098.0, 285.5, 120.6, 110.8, 3555.3, 2448.5, 2969.8, 102.…
$ `2005` <dbl> 28070.4, 274.6, 123.3, 77.1, 2790.5, 2084.6, 2364.4, 90.4,…
$ `2004` <dbl> 25457.1, 188.9, 138.6, 68.7, 2313.7, 1742.1, 2231.8, 87.6,…
$ `2003` <dbl> 18942.0, 157.0, 151.4, 45.3, 2059.7, 1241.4, 1767.5, 63.0,…
$ `2002` <dbl> 17741.7, 118.3, 144.3, 37.9, 1912.0, 907.6, 1801.4, 51.5, …
$ `2001` <dbl> 16226.5, 121.8, 97.8, 29.8, 1866.5, 735.4, 1707.5, 24.6, 3…
$ `2000` <dbl> 15816.9, 89.7, 100.5, 18.4, 1860.6, 746.8, 1607.9, 28.9, 3…
$ category <chr> "Exports", "Exports", "Exports", "Exports", "Exports", "Ex…
# A tibble: 6 × 26
data_series `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016` `2015`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Asia 1.71e5 1.74e5 1.47e5 1.16e5 1.15e5 1.04e5 85093. 70462. 69881.
2 Bangladesh 9.86e2 9.04e2 6.75e2 5.82e2 6.42e2 6.64e2 548. 483. 424.
3 Brunei Darussa… 9.60e2 8.91e2 5.79e2 5.33e2 5.56e2 4.96e2 464. 420. 465.
4 Cambodia 4.76e2 3.89e2 2.56e2 2.50e2 2.59e2 2.90e2 227. 196. 114.
5 Hong Kong 2.17e4 2.13e4 1.81e4 1.51e4 1.13e4 1.05e4 8240. 7352. 7260
6 India 9.91e3 9.73e3 7.82e3 6.34e3 6.38e3 5.96e3 5326. 4422. 4236.
# ℹ 16 more variables: `2014` <dbl>, `2013` <dbl>, `2012` <dbl>, `2011` <dbl>,
# `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>, `2006` <dbl>,
# `2005` <dbl>, `2004` <dbl>, `2003` <dbl>, `2002` <dbl>, `2001` <dbl>,
# `2000` <dbl>, category <chr>
data_series 2023 2022 2021
Length:136 Min. : 93.1 Min. : 83.5 Min. : 53.6
Class :character 1st Qu.: 701.3 1st Qu.: 596.2 1st Qu.: 492.4
Mode :character Median : 2595.8 Median : 2330.3 Median : 1957.3
Mean : 12524.7 Mean : 12169.6 Mean : 10353.0
3rd Qu.: 8674.3 3rd Qu.: 8664.1 3rd Qu.: 7272.9
Max. :170787.7 Max. :174243.3 Max. :147125.9
2020 2019 2018 2017
Min. : 69.7 Min. : 49.2 Min. : 82.2 Min. : 67.9
1st Qu.: 388.6 1st Qu.: 408.8 1st Qu.: 378.1 1st Qu.: 311.0
Median : 1725.6 Median : 1685.2 Median : 1597.5 Median : 1364.0
Mean : 8348.4 Mean : 7773.0 Mean : 7410.0 Mean : 6392.0
3rd Qu.: 6002.1 3rd Qu.: 6162.6 3rd Qu.: 5910.9 3rd Qu.: 5726.2
Max. :115989.3 Max. :114573.1 Max. :104182.5 Max. :85093.3
2016 2015 2014 2013
Min. : 0 Min. : 34.3 Min. : 40.3 Min. : 48.9
1st Qu.: 269 1st Qu.: 283.6 1st Qu.: 276.0 1st Qu.: 239.6
Median : 1188 Median : 1250.4 Median : 1148.3 Median : 1018.6
Mean : 5509 Mean : 5568.4 Mean : 5174.7 Mean : 4585.9
3rd Qu.: 4716 3rd Qu.: 5071.8 3rd Qu.: 4882.9 3rd Qu.: 4602.4
Max. :70462 Max. :69881.1 Max. :62802.7 Max. :55949.9
2012 2011 2010 2009
Min. : 43.5 Min. : 27.2 Min. : 4.7 Min. : 0.0
1st Qu.: 202.4 1st Qu.: 193.8 1st Qu.: 146.8 1st Qu.: 127.5
Median : 809.0 Median : 746.0 Median : 599.6 Median : 514.7
Mean : 4044.2 Mean : 3608.9 Mean : 3310.0 Mean : 2888.4
3rd Qu.: 3777.1 3rd Qu.: 3442.4 3rd Qu.: 2625.2 3rd Qu.: 2342.0
Max. :50496.3 Max. :48318.2 Max. :46254.5 Max. :40325.3
2008 2007 2006 2005
Min. : 4.5 Min. : 1.8 Min. : 0.0 Min. : 0.00
1st Qu.: 152.4 1st Qu.: 112.6 1st Qu.: 114.9 1st Qu.: 72.92
Median : 549.5 Median : 441.4 Median : 354.6 Median : 321.20
Mean : 3020.2 Mean : 2591.2 Mean : 2250.5 Mean : 1878.82
3rd Qu.: 2357.7 3rd Qu.: 1896.8 3rd Qu.: 1785.8 3rd Qu.: 1518.40
Max. :43935.5 Max. :39789.5 Max. :33098.0 Max. :28070.40
2004 2003 2002 2001
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 55.27 1st Qu.: 34.12 1st Qu.: 31.05 1st Qu.: 22.55
Median : 257.65 Median : 195.60 Median : 127.45 Median : 107.30
Mean : 1699.77 Mean : 1350.57 Mean : 1132.43 Mean : 1019.79
3rd Qu.: 1530.25 3rd Qu.: 1140.42 3rd Qu.: 890.12 3rd Qu.: 767.73
Max. :25457.10 Max. :18942.00 Max. :17741.70 Max. :16226.50
2000 category
Min. : 0.00 Length:136
1st Qu.: 19.18 Class :character
Median : 98.50 Mode :character
Mean : 962.32
3rd Qu.: 679.02
Max. :15816.90
# A tibble: 0 × 26
# ℹ 26 variables: data_series <chr>, 2023 <dbl>, 2022 <dbl>, 2021 <dbl>,
# 2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <dbl>, 2015 <dbl>,
# 2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>, 2009 <dbl>,
# 2008 <dbl>, 2007 <dbl>, 2006 <dbl>, 2005 <dbl>, 2004 <dbl>, 2003 <dbl>,
# 2002 <dbl>, 2001 <dbl>, 2000 <dbl>, category <chr>
data_series 2023 2022 2021 2020 2019
0 0 0 0 0 0
2018 2017 2016 2015 2014 2013
0 0 0 0 0 0
2012 2011 2010 2009 2008 2007
0 0 0 0 0 0
2006 2005 2004 2003 2002 2001
0 0 0 0 0 0
2000 category
0 0
replace_na() function to replace missing values with 0 in specified columns.tibble [136 × 26] (S3: tbl_df/tbl/data.frame)
$ data_series: chr [1:136] "Asia" "Bangladesh" "Brunei Darussalam" "Cambodia" ...
$ 2023 : num [1:136] 170788 986 960 476 21686 ...
$ 2022 : num [1:136] 174243 904 891 390 21307 ...
$ 2021 : num [1:136] 147126 675 579 256 18126 ...
$ 2020 : num [1:136] 115989 582 533 250 15063 ...
$ 2019 : num [1:136] 114573 642 556 259 11273 ...
$ 2018 : num [1:136] 104183 664 496 290 10466 ...
$ 2017 : num [1:136] 85093 548 464 227 8240 ...
$ 2016 : num [1:136] 70462 483 420 196 7352 ...
$ 2015 : num [1:136] 69881 424 465 114 7260 ...
$ 2014 : num [1:136] 62802.7 384.3 520.4 86.3 5841.4 ...
$ 2013 : num [1:136] 55949.9 347.5 474.4 89.2 5184.6 ...
$ 2012 : num [1:136] 50496 349 464 115 5156 ...
$ 2011 : num [1:136] 48318 334 468 109 5105 ...
$ 2010 : num [1:136] 46254.5 272.6 508.1 78.4 4928.8 ...
$ 2009 : num [1:136] 40325.3 209.3 324.4 99.3 4344.5 ...
$ 2008 : num [1:136] 43936 265 273 140 4677 ...
$ 2007 : num [1:136] 39789.5 284 248.3 88.1 4429.7 ...
$ 2006 : num [1:136] 33098 286 121 111 3555 ...
$ 2005 : num [1:136] 28070.4 274.6 123.3 77.1 2790.5 ...
$ 2004 : num [1:136] 25457.1 188.9 138.6 68.7 2313.7 ...
$ 2003 : num [1:136] 18942 157 151.4 45.3 2059.7 ...
$ 2002 : num [1:136] 17741.7 118.3 144.3 37.9 1912 ...
$ 2001 : num [1:136] 16226.5 121.8 97.8 29.8 1866.5 ...
$ 2000 : num [1:136] 15816.9 89.7 100.5 18.4 1860.6 ...
$ category : chr [1:136] "Exports" "Exports" "Exports" "Exports" ...
A new “category” column will be added to classify each trading partner as a Country, Region, or Economic & Political Union. This helps organize the data, making it easier to analyze trade patterns across different entity types.
| types | data_series |
|---|---|
| Country | Bangladesh, Brunei Darussalam, Cambodia, Hong Kong, India, Indonesia, Israel, Japan, Kuwait, Mainland China, Malaysia, Myanmar, Pakistan, Philippines, Qatar, Republic Of Korea, Saudi Arabia, Sri Lanka, Taiwan Thailand,Turkiye United Arab Emirates, Vietnam, Belgium, Cyprus Denmark, Finland France Germany, Greece Ireland Italy, Luxembourg, Netherlands, Norway, Portugal, Russian Federation, Spain, Sweden, Switzerland, United Kingdom, United States Of America, Canada, Australia, Marshall Islands, New Zealand, Papua New Guinea, Bermuda, Brazil, British Virgin Islands, Cayman Islands, Chile, Mexico Panama, Peru, Egypt, Liberia, Mauritius, Nigeria South Africa |
| Region | Asia, Europe, North America, Oceania, South And Central America And The Caribbean, Africa |
| Economic & Political Union | ASEAN, European Union (EU-27) |
# Load the categorization mapping from Excel
categorization_mapping <- read_excel("data/categorizationmapping.xlsx")
# Ensure column names are consistent
colnames(categorization_mapping) <- tolower(gsub(" ", "_", colnames(categorization_mapping)))
# Merge only the type into combined_data
combined_data <- combined_data %>% left_join(categorization_mapping, by = "data_series")To calculate the total trade values for each trading partner, the dataset was grouped by “data_series”, and trade values from 2000 to 2023 were summed using summarise(), ensuring missing values were ignored with na.rm = TRUE. A “category” column was assigned the value “Total” to indicate aggregated trade figures. The computed totals were then appended to the original dataset using bind_rows().
# Summing up the values for each trading partner across all years (2000-2023)
summed_data <- combined_data %>%
group_by(data_series, types) %>%
summarise(across(matches("^\\d{4}$"), sum, na.rm = TRUE), .groups = "drop") %>%
mutate(category = "Total")
# Add the total category to the combined dataset
final_data <- bind_rows(combined_data, summed_data)
# Reorder columns to move category to the second position
final_data <- final_data %>% select(data_series, category, types, everything())
# Display a preview of the merged data
print(head(final_data))# A tibble: 6 × 27
data_series category types `2023` `2022` `2021` `2020` `2019` `2018` `2017`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Asia Exports Regi… 1.71e5 1.74e5 1.47e5 1.16e5 1.15e5 1.04e5 85093.
2 Bangladesh Exports Coun… 9.86e2 9.04e2 6.75e2 5.82e2 6.42e2 6.64e2 548.
3 Brunei Daruss… Exports Coun… 9.60e2 8.91e2 5.79e2 5.33e2 5.56e2 4.96e2 464.
4 Cambodia Exports Coun… 4.76e2 3.89e2 2.56e2 2.50e2 2.59e2 2.90e2 227.
5 Hong Kong Exports Coun… 2.17e4 2.13e4 1.81e4 1.51e4 1.13e4 1.05e4 8240.
6 India Exports Coun… 9.91e3 9.73e3 7.82e3 6.34e3 6.38e3 5.96e3 5326.
# ℹ 17 more variables: `2016` <dbl>, `2015` <dbl>, `2014` <dbl>, `2013` <dbl>,
# `2012` <dbl>, `2011` <dbl>, `2010` <dbl>, `2009` <dbl>, `2008` <dbl>,
# `2007` <dbl>, `2006` <dbl>, `2005` <dbl>, `2004` <dbl>, `2003` <dbl>,
# `2002` <dbl>, `2001` <dbl>, `2000` <dbl>
Key makeover changes:
1️⃣ Breaking down trade data by types - e.g.: countries/ regions/ economic & political union
Original visualization grouped countries, regions, economic & political unions making trade relationships harder to interpret.
Revised visualization will categorize data into different types, ensuring clarity.
2️⃣ Pie chart for region-level aggregation:

library(ggplot2)
library(dplyr)
library(viridis)
library(ggrepel)
# Filter the trade data for regions
region_trade_data <- final_data %>%
filter(types == "Region", category == "Total") %>%
select(region = data_series, trade_value = `2023`)
# Ensure correct ordering based on trade value
region_trade_data <- region_trade_data %>%
arrange(desc(trade_value))
# Compute cumulative sum for positioning
region_trade_data <- region_trade_data %>%
mutate(label = paste0(region, "\n$", format(trade_value, big.mark = ",")),
pos = cumsum(trade_value) - (0.5 * trade_value))
# Create Static Pie Chart with Labels Outside
static_pie_chart <- ggplot(region_trade_data, aes(x = "", y = trade_value, fill = trade_value)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar(theta = "y") +
scale_fill_viridis(option = "viridis", direction = -1) +
geom_text_repel(aes(y = pos, label = label), size = 3, color = "black", nudge_x = 1, box.padding = 0.5) +
theme_void() +
labs(
title = "Major Trading Partners for Trade in Services, 2023",
subtitle = "Total Trade Volume Distribution by Region",
fill = "Trade Value (SGD)"
) +
theme(
legend.position = "right",
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Centered, bold title
plot.subtitle = element_text(hjust = 0, size = 12) # Centered subtitle
)
# Show the static pie chart
print(static_pie_chart)Key makeover changes:
1️⃣ Enhanced trade value representation with a geo-spatial context -
Original uses static infographics with flag-based indicators, the revised visualization presents an intecative world map, allowing for dynamic exploration.
Color gradient (from yellow to purple) effectively communciates the trade value inensity in SGD
library(ggplot2)
library(dplyr)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(viridis)
library(plotly)
# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf")
# Filter the trade data (assuming it's already in your R environment as `final_data`)
trade_data <- final_data %>%
filter(types == "Country", category == "Total") %>%
select(region = data_series, trade_value = `2023`)
# Standardize region names
trade_data$region <- tolower(trimws(trade_data$region))
world$name <- tolower(trimws(world$name))
# Correct mismatched country names
trade_data$region <- recode(trade_data$region,
"mainland china" = "china",
"republic of korea" = "south korea",
"russian federation" = "russia",
"turkiye" = "turkey",
"british virgin islands" = "u.s. virgin is.",
"cayman islands" = "cayman is.",
"brunei darussalam" = "brunei",
"marshall islands" = "marshall is.")
# Merge world map with trade data
world <- world %>%
left_join(trade_data, by = c("name" = "region"))
# Create a ggplot object with title and subtitle
ggplot_map <- ggplot(data = world) +
geom_sf(aes(fill = trade_value, text = paste("Country:", name, "<br>Trade Value: $", trade_value)),
color = "black", size = 0.2) +
scale_fill_viridis(option = "viridis", direction = -1, na.value = "gray90") +
theme_minimal() +
labs(
title = "Major Trading Partners for Trade in Services, 2023",
subtitle = "Total Trade Value Distribution by Countries",
fill = "Trade Value (SGD)"
) +
theme(
plot.title = element_text(hjust = 0, face = "bold", size = 14),
plot.subtitle = element_text(hjust = 0, size = 12)
)
# Convert to interactive Plotly map and manually add subtitle
plotly_map <- ggplotly(ggplot_map, tooltip = "text") %>%
layout(
annotations = list(
list(
text = "<b>Total Trade Value Distribution by Countries</b>",
x = 0, # Left align
y = 1.03, # Position above plot
xref = "paper",
yref = "paper",
showarrow = FALSE,
font = list(size = 12)
)
)
)
# Show interactive map
plotly_mapKey observations:
Total trade vol()
Top 5 trade partners: United States, Japan, Mainland China, Australia, and United Kingdom
Trends: Significant growth post-2010, with accelerated trade activities from 2018 onward, with Asia and North America dominating trade volume.
Exports()
Imports()



library(ggplot2)
library(dplyr)
library(tidyr)
library(viridis)
# Filter trade data for countries
trade_data <- final_data %>%
filter(types == "Country", category == "Imports") %>%
select(country = data_series, `2000`:`2023`) # Select years from 2000 to 2023
# Reshape data from wide to long format
trade_data_long <- trade_data %>%
pivot_longer(cols = `2000`:`2023`, names_to = "year", values_to = "trade_value") %>%
mutate(year = as.numeric(year)) # Convert year to numeric
# Calculate total trade value per country and reorder factors (highest at the top)
trade_data_long <- trade_data_long %>%
group_by(country) %>%
mutate(total_trade_value = sum(trade_value, na.rm = TRUE)) %>%
ungroup() %>%
mutate(country = fct_reorder(country, total_trade_value, .desc = TRUE))
# Create heatmap-style plot using geom_tile()
hori_plot <- ggplot(trade_data_long, aes(x = year, y = country, fill = trade_value)) +
geom_tile(color = "white") + # Add white borders between tiles
scale_fill_viridis_c(option = "magma", na.value = "gray90") + # Color gradient for intensity
labs(
title = "Total Trade Value Trends (2000-202f3)",
subtitle = "Major Trading Partners for Trade in Services",
x = "Year",
y = "Country",
fill = "Trade Value (SGD)"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 8), # Adjust country label size
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate year labels
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Left-aligned title
plot.subtitle = element_text(hjust = 0, size = 12) # Left-aligned subtitle
)
# Show plot
print(hori_plot)In the below code, we will explore the tidymodels approach in time series forecasting.
# Load all necessary packages
library(tidyverse)
library(modeltime)
library(timetk)
library(lubridate)
library(rsample)
library(parsnip)
library(recipes)
library(workflows)
library(yardstick)
# Convert wide format to long format
long_data <- final_data %>%
pivot_longer(cols = matches("^\\d{4}$"), names_to = "year", values_to = "value") %>%
mutate(year = as.integer(year))
# Convert year to Date format
long_data <- long_data %>%
mutate(date = make_date(year = year, month = 1, day = 1)) %>%
select(data_series, category, types, date, value)
# Filter for a specific country and only "Exports"
selected_series <- long_data %>%
filter(data_series == "United States Of America", category == "Exports") %>%
select(date, value) %>%
arrange(date)
# Remove duplicates if any
selected_series <- selected_series %>%
distinct(date, .keep_all = TRUE)
# Perform a time-based split (80% training, 20% testing)
splits <- initial_time_split(selected_series, prop = 0.8)
cat("The training dataset contains", nrow(training(splits)), "observations.\n")The training dataset contains 19 observations.
The testing dataset consists of 5 observations.
In the code below, we will fit four models: - Error-Trend-Season (ETS) model by using exp_smoothing() - Auto ARIMA model by using arima_reg() - Boosted Auto ARIMA by using arima_boost() - Prophet model by using prophet_reg()
model_fit_ets <- exp_smoothing() %>%
set_engine("ets") %>%
fit(value ~ date, data = training(splits))
model_fit_arima <- arima_reg() %>%
set_engine("auto_arima") %>%
fit(value ~ date, data = training(splits))
model_fit_arima_boosted <- arima_boost(
min_n = 2,
learn_rate = 0.015) %>%
set_engine("auto_arima_xgboost") %>%
fit(value ~ date, data = training(splits))
model_fit_prophet <- prophet_reg() %>%
set_engine("prophet") %>%
fit(value ~ date, data = training(splits))Next, we will use modeltime_table of modeltime package to add each of the models to a Modeltime Table.
# Modeltime Table
# A tibble: 4 × 3
.model_id .model .model_desc
<int> <list> <chr>
1 1 <fit[+]> ETS(M,A,N)
2 2 <fit[+]> ARIMA(0,1,0) WITH DRIFT
3 3 <fit[+]> ARIMA(0,1,0) WITH DRIFT
4 4 <fit[+]> PROPHET
We will then use the modeltime_calibrate() to add a new column called .calibrate_data into the newly created models_tbl tibble data table.
# Modeltime Table
# A tibble: 4 × 5
.model_id .model .model_desc .type .calibration_data
<int> <list> <chr> <chr> <list>
1 1 <fit[+]> ETS(M,A,N) Test <tibble [5 × 4]>
2 2 <fit[+]> ARIMA(0,1,0) WITH DRIFT Test <tibble [5 × 4]>
3 3 <fit[+]> ARIMA(0,1,0) WITH DRIFT Test <tibble [5 × 4]>
4 4 <fit[+]> PROPHET Test <tibble [5 × 4]>
We will use two way to assess the accuracy of the models - by (1) means of accuracy metrics, (2) visualization
modeltime_accuracy() of modeltime package is used compute the accuracy metrics. Then, table_modeltime_accuracy() is used to present the accuracy metrics in tabular form.
| Accuracy Table | ||||||||
|---|---|---|---|---|---|---|---|---|
| .model_id | .model_desc | .type | mae | mape | mase | smape | rmse | rsq |
| 1 | ETS(M,A,N) | Test | 16181.62 | 31.88 | 1.49 | 40.03 | 19413.03 | 0.73 |
| 2 | ARIMA(0,1,0) WITH DRIFT | Test | 13059.38 | 24.56 | 1.20 | 30.18 | 16738.06 | 0.73 |
| 3 | ARIMA(0,1,0) WITH DRIFT | Test | 13059.38 | 24.56 | 1.20 | 30.18 | 16738.06 | 0.73 |
| 4 | PROPHET | Test | 16031.61 | 31.46 | 1.48 | 39.53 | 19329.48 | 0.61 |
We can also use the interactive plotly visualization to assess the accuracy of the models.
Next, we refit the models to the full dataset using modeltime_refit() and forecast them forward.We can use modeltime_refit() to refit the forecasting models with the full data.
Then, modeltime_forecast() is used to forecast to a selected future time period, in this example 10 years.
refit_tbl <- calibration_tbl %>%
modeltime_refit(data = selected_series) # Now trained on full dataset
forecast_tbl <- refit_tbl %>%
modeltime_forecast(
h = 10, # Forecast for 10 years
actual_data = selected_series,
keep_data = TRUE # Keep historical data for better visualization
)
forecast_tbl %>%
plot_modeltime_forecast(
.legend_max_width = 25,
.interactive = TRUE,
.plotly_slider = TRUE
)| Accuracy Table | ||||||||
|---|---|---|---|---|---|---|---|---|
| .model_id | .model_desc | .type | mae | mape | mase | smape | rmse | rsq |
| 1 | ETS(M,A,N) | Test | 16181.62 | 31.88 | 1.49 | 40.03 | 19413.03 | 0.73 |
| 2 | UPDATE: ARIMA(0,1,0)(0,0,1)[5] WITH DRIFT | Test | 13059.38 | 24.56 | 1.20 | 30.18 | 16738.06 | 0.73 |
| 3 | UPDATE: ARIMA(0,1,0)(0,0,1)[5] WITH DRIFT | Test | 13059.38 | 24.56 | 1.20 | 30.18 | 16738.06 | 0.73 |
| 4 | PROPHET | Test | 16031.61 | 31.46 | 1.48 | 39.53 | 19329.48 | 0.61 |
---
title: "Take-home Exercise 02"
author: "Andrea Yeo"
date-modified: "last-modified"
date: "February 24, 2025"
execute:
echo: true
eval: true
warning: false
freeze: true
---
# Be Tradewise or Otherwise
[***All values are in Million Dollars***]{style="font-size: 16px; color: blue;"}
## 1. Overview
### 1.1 Background
Since Mr. Donald Trump assumed office as the President of the United States on January 20, 2025, global trade has been a highly scrutinized topic. Given Singapore's role as a key global trade hub, understanding its trade dynamics is crucial in assessing the potential impact of shifting geopolitical and economic policies.
In this take-home exercise, we apply newly acquired data visualization and analytical techniques to explore **Singapore’s Trade in Services** - *Singapore's exports and imports of services between Singapore and the rest of the world.* By leveraging statistical tools and visualization methods, this study aims to uncover **key trends, service category distributions, and trade imbalances** while providing deeper insights into **how different service sectors contribute to Singapore’s economy.**
### 1.2 Research objective
Using data from the [Department of Statistics Singapore, DOS](https://www.singstat.gov.sg/) on [Trade In Services By Services Category](https://www.singstat.gov.sg/find-data/search-by-theme/trade-and-investment/trade-in-services/latest-data), this analysis applies Exploratory Data (EDA) and data visualization techniques to:
- **Assess three existing visualizations** from [this page](https://www.singstat.gov.sg/modules/infographics/singapore-international-trade), identifying each of these visualization pros and cons.
- **Redesign and enhance these visualizations** using ggplot2 and other R packages to improve data interpretation, visual appeal, and accessibility.
- **Conduct time-series analysis and forecasting** to identify key trends in Singapore’s trade in services, evaluate growth patterns across service categories, and predict potential future trade movements.
## 2. Getting started
### 2.1 Loading the packages
The following R packages will be loaded for this exercise using `pacman::p_load()`:
- [**readxl**](https://www.rdocumentation.org/packages/readxl/versions/1.4.3): Reads Excel files (.xls and .xlsx) into R
- [**writexl**](https://www.rdocumentation.org/packages/misty/versions/0.4.6/topics/write.xlsx): Writes data frames to Excel files (.xlsx).
- [**tidyverse**](https://www.rdocumentation.org/packages/tidyverse/versions/2.0.0): A collection of packages for data manipulation, visualization, and modeling.
- [**timetk**](https://www.rdocumentation.org/packages/timetk/versions/2.9.0): Provides time series analysis and forecasting tools.
- [**forecast**](https://www.rdocumentation.org/packages/forecast/versions/8.23.0): Implements forecasting methods like ARIMA, ETS, and more.
- [**ggplot2**](https://www.rdocumentation.org/packages/ggplot2/versions/3.5.0): A powerful package for data visualization using the grammar of graphics.
- [**plotly**](https://www.rdocumentation.org/packages/plotly/versions/4.10.4): Creates interactive plots, including 3D and web-based visualizations.
- [**treemapify**](https://www.rdocumentation.org/packages/treemapify/versions/0.2.1): Enables the creation of treemap visualizations in ggplot2.
- [**patchwork**](https://www.rdocumentation.org/packages/patchwork/versions/0.0.1): Helps combine multiple ggplot2 plots into a single layout.
- [**dplyr**](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8): Provides fast and intuitive data manipulation functions.
- [**CGPfunctions**](https://www.rdocumentation.org/packages/CGPfunctions/versions/0.6.3): Offers additional plotting and visualization functions.
- [**rnaturalearth**](https://www.rdocumentation.org/packages/rnaturalearth/versions/0.0.0.9000): Provides world map data for geographic visualizations.
- [**rnaturalearthdata**](https://www.rdocumentation.org/packages/rnaturalearthdata/versions/1.0.0): Supplies natural Earth vector data for spatial analysis.
- [**sf**](https://www.rdocumentation.org/packages/sf/versions/0.1-1): Supports simple features for spatial data (GIS).
- [**viridis**](https://www.rdocumentation.org/packages/viridisLite/versions/0.4.2/topics/viridis): Provides color palettes for better visualization accessibility.
- [**ggrepel**](https://www.rdocumentation.org/packages/ggrepel/versions/0.9.6): Improves text labeling in ggplot2 by avoiding overlapping labels.
- [**ggHoriPlot**](https://www.rdocumentation.org/packages/ggHoriPlot/versions/1.0.1): Creates horizontal bar plots for ggplot2.
- [**ggthemes**](https://www.rdocumentation.org/packages/ggthemes/versions/5.1.0): Adds additional themes and styles for ggplot2.
- [**tidymodels**](https://www.rdocumentation.org/packages/tidymodels/versions/1.3.0): A collection of packages for machine learning and modeling.
- [**modeltime**](https://www.rdocumentation.org/packages/modeltime/versions/1.3.0): A framework for time series forecasting using machine learning and statistical models.
```{r}
pacman::p_load(readxl, openxlsx, data.table, tidyverse, timetk, plotly, forecast, ggplot2, CGPfunctions, rnaturalearth, rnaturalearthdata, sf, viridis, ggrepel, ggHoriPlot, ggthemes, tidymodels, timetk, modeltime)
```
## 3. DataVis makeover 1
### 3.1 Makeover of - Overall Exports and Import of Services
#### 3.1.1 Original visualization
The original visualizations analyzed in this analysis are sourced from [here](https://www.singstat.gov.sg/modules/infographics/singapore-international-trade).
{width="576"}
#### 3.1.2 Evaluation of the original visualization
In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) **clarity**, and (2) **visual appeal**:
- **Clarity**: How well the data is presented and understood
| Pros | Cons | Suggested fixes |
|------------------------|------------------------|------------------------|
| **Clear categorization of exports and imports** - The chart effectively differentiates between exports and imports | **Year-specific colors make trend analysis difficult** – Each year is assigned a different color, making it hard to track trends across time. | **Use consistent colors for exports and imports across all years** (Implemented by converting them into line charts). |
| **Total trade values are highlighted** – The total trade (Exports + Imports) is displayed for each year. | **Floating "Total" labels can be easily overlooked** – They are placed above bars separately, which may lead to misinterpretation. | Introduce a line chart for total trade trends instead of floating labels. |
| **Growth rate (CAGR) and trade balance data are included** – Additional insights are provided. | **Trade balance is not well integrated** – It is displayed separately at the bottom instead of being visually linked to the bars. | **Represent trade balance directly as a bar chart** (Green for surplus, Red for deficit) |
- **Visual appeal**: How visually engaging and effective the design is
| Pros | Cons | Suggested fixes |
|------------------------|------------------------|------------------------|
| **Engaging use of colors and icons** – Makes the visualization appealing and eye-catching. | **Overuse of colors creates clutter** – Different colors for each year make it visually overwhelming. | Reduce unnecessary color variations and **simplify color coding**. |
### 3.2 Makeover of the original data visualization
#### 3.2.1 Data wrangling & data prepration
##### 3.2.1.1 Importing the data
The code chunk below imports the [Trade In Services By Services Category](https://tablebuilder.singstat.gov.sg/table/TS/M060251) dataset, downloaded from [Department of Statistics Singapore, DOS](https://www.singstat.gov.sg/), using the `read_excel()` function from the `readxl` package.
```{r}
trade_services <- read_excel("data/Trade In Services By Services Category_base.xlsx")
```
##### 3.2.1.2 Understanding the data structure
- `glimpse()`: provides a transposed overview of a dataset, showing variables and their types in a concise format.
- `head()`: displays the first few rows of a dataset (default is 6 rows) to give a quick preview of the data.
- `summary()`: generates a statistical summary of each variable, including measures like mean, median, and range for numeric data.
- `duplicated()`:returns a logical vector indicating which elements or rows in a vector or data frame are duplicates.
- `colSums(is.na())`: counts the number of missing values (NA) in each column of the data frame.
- `str()`: use `str()` to display the column names, data types, and a preview of the data.
:::::: panel-tabset
## glimpse()
```{r}
glimpse(trade_services)
```
## head()
```{r}
head(trade_services)
```
## summary()
```{r}
summary(trade_services)
```
## duplicated()
```{r}
trade_services[duplicated(trade_services),]
```
::: callout-note
- Ensure that there are no duplicated columns, if not will have to investigate further.
:::
## colSum(is.na())
```{r}
colSums(is.na(trade_services))
```
::: callout-note
- Ensure that there are no NA values, if not will have to investigate further.
- Possibility to use `drop_na()` function to drop rows where any specified column contains a missing value.
:::
## str())
```{r}
str(trade_services)
```
::: callout-note
- Ensure that all variables are correctly classified by data type; recast variable types if needed.
- Variables are correctly classified - where categorical variables are classified as **character**, while continuous variables are classified as **double**.
:::
::::::
The `trade_services` tibble contains 26 attributes, as shown above.
The following preprocessing checks were conducted as part of data preparation:
::: callout-tip
## Preprocessing Checks
- Verified that the correct data types were loaded in the `trade_services` dataset using `glimpse()` and `str()`
- Ensured there were no duplicate variable names using `duplicated()` in the dataset
- Checked for missing values using `colSums(is.na())`
:::
::: callout-note
- `trade_services` dataset will be used for both **DataVis makeover 1 & 2**
:::
##### 3.2.1.3 Data preparation and filtering for relevant variables
After importing the `trade_services` dataset, we will filter for the three rows - **Exports of Services, Imports of Services,** and **Total Trade in Services**, which are essential for recreating the original visualization.
```{r}
# Select rows where 'Data Series' is either "Exports Of Services" or "Imports Of Services"
export_vs_import <- trade_services %>%
filter(`Data Series` %in% c("Exports Of Services", "Imports Of Services", "Total Trade In Services"))
# View the filtered data
print(export_vs_import)
```
#### 3.2.2 Revised visualization - Combination plot
Key makeover changes:
1️⃣ **Overall design improvements:**
- **Replaced bar chart with a line chart for exports and imports** –\> Helps in tracking trends more effectively over time instead of color-coded bars per year.
- **Introduced a dashed line for total trade values** –\> Previously, total trade was only displayed as floating labels above bars, which could be overlooked.
- **Changed color usage** –\> Original chart had different colors for each year, making trend analysis harder. Now, consistent colors are used:
- Exports are in [red]{style="color:red"},
- Imports are in [blue]{style="color:blue"},
- Total Trade is in [black (dashed)]{style="color:black"},
- Trade Surplus is in [green]{style="color:green"}, and
- Trade Deficit is in [red]{style="color:red"}.
2️⃣ **Trade Balance Integration:**
- **Integration of key insights** –\> Previously trade balance was displayed separately as a small section at the bottom. Now, I have integrated it as a bar chart within the main graph
- [green]{style="color:green"} bars represent trade surplus, and
- [red]{style="color:red"} bars will represent trade deficit.
::: callout-note
## Overall
- The new chart is cleaner, and provides a better narrative of how exports, imports, total trade, and trade balance evolve over time.
- The new chart simplifies year-on-year trend analysis, making it easier to identify patterns in Singapore's international trade.
:::
::: panel-tabset
## Enhance graph()
```{r}
#| echo = FALSE
library(ggplot2)
library(dplyr)
library(tidyr)
# Filter relevant rows (Exports, Imports, and Total Trade)
export_vs_import <- trade_services %>%
filter(`Data Series` %in% c("Exports Of Services", "Imports Of Services", "Total Trade In Services"))
# Convert dataset from wide to long format for ggplot
data_long <- export_vs_import %>%
pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year >= 2020 & Year <= 2024)
# Extract `Total Trade In Services` from dataset
total_trade_data <- data_long %>%
filter(`Data Series` == "Total Trade In Services") %>%
select(Year, Value) %>%
rename(Total_Trade = Value)
# Calculate Trade Balance (Exports - Imports)
trade_balance <- data_long %>%
spread(`Data Series`, Value) %>%
mutate(Services_Trade_Balance = `Exports Of Services` - `Imports Of Services`) %>%
select(Year, Services_Trade_Balance)
# Merge Trade Balance & Total Trade into DataFrame
data_long <- left_join(data_long, trade_balance, by = "Year")
data_long <- left_join(data_long, total_trade_data, by = "Year")
# Increase plotting window size
options(repr.plot.width=16, repr.plot.height=9)
# Create the plot
p <- ggplot(data_long) +
# **Line Chart for Exports**
geom_line(data = subset(data_long, `Data Series` == "Exports Of Services"),
aes(x = Year, y = Value, color = "Exports"), linewidth = 1.5) +
# **Data Labels for Exports**
geom_text(data = subset(data_long, `Data Series` == "Exports Of Services"),
aes(x = Year, y = Value, label = round(Value, 1)),
vjust = -1, size = 2.5, color = "red") +
# **Line Chart for Imports**
geom_line(data = subset(data_long, `Data Series` == "Imports Of Services"),
aes(x = Year, y = Value, color = "Imports"), linewidth = 1.5) +
# **Data Labels for Imports**
geom_text(data = subset(data_long, `Data Series` == "Imports Of Services"),
aes(x = Year, y = Value, label = round(Value, 1)),
vjust = 1.5, size = 2.5, color = "blue") +
# **Line Chart for Total Trade (Black, Dashed)**
geom_line(data = total_trade_data,
aes(x = Year, y = Total_Trade, color = "Total Trade"), linewidth = 1.5, linetype = "dashed") +
# **Labels for Total Trade**
geom_text(data = total_trade_data,
aes(x = Year, y = Total_Trade + 20000, label = round(Total_Trade, 1)),
vjust = -0.5, size = 2.5, color = "black") +
# **Bar Chart for Trade Balance (Surplus = Green, Deficit = Red)**
geom_bar(data = trade_balance,
aes(x = Year, y = Services_Trade_Balance,
fill = ifelse(Services_Trade_Balance >= 0, "Surplus", "Deficit")),
stat = "identity", width = 0.5) +
# **Move Labels to the Top of Trade Balance Bars**
geom_text(data = trade_balance,
aes(x = Year, y = Services_Trade_Balance + 5000,
label = round(Services_Trade_Balance, 1)),
vjust = -0.5, size = 2.5, color = "black") +
# **Title & Labels**
labs(title = "Trends in International Trade in Services (2020-2024)",
subtitle = "Exports, Imports & Total Trade as Lines, Trade Balance as Bars",
x = "Year", y = "S$ Billion") +
# **Custom Theme**
theme_minimal(base_size = 10) +
# **Color Customization**
scale_color_manual(name = "Category", values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
scale_fill_manual(name = "Trade Balance", values = c("Surplus" = "green", "Deficit" = "red")) +
# **Formatting**
theme(
legend.position = "bottom",
panel.grid.major.y = element_line(color = "gray", linetype = "dashed"),
axis.text.x = element_text(face = "bold"))
# Display the plot
print(p)
```
## Code()
```{r}
#| eval = FALSE
# Filter relevant rows (Exports, Imports, and Total Trade)
export_vs_import <- trade_services %>%
filter(`Data Series` %in% c("Exports Of Services", "Imports Of Services", "Total Trade In Services"))
# Convert dataset from wide to long format for ggplot
data_long <- export_vs_import %>%
pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year >= 2020 & Year <= 2024)
# Extract `Total Trade In Services` from dataset
total_trade_data <- data_long %>%
filter(`Data Series` == "Total Trade In Services") %>%
select(Year, Value) %>%
rename(Total_Trade = Value)
# Calculate Trade Balance (Exports - Imports)
trade_balance <- data_long %>%
spread(`Data Series`, Value) %>%
mutate(Services_Trade_Balance = `Exports Of Services` - `Imports Of Services`) %>%
select(Year, Services_Trade_Balance)
# Merge Trade Balance & Total Trade into DataFrame
data_long <- left_join(data_long, trade_balance, by = "Year")
data_long <- left_join(data_long, total_trade_data, by = "Year")
# Increase plotting window size
options(repr.plot.width=16, repr.plot.height=9)
# Create the plot
p <- ggplot(data_long) +
# **Line Chart for Exports**
geom_line(data = subset(data_long, `Data Series` == "Exports Of Services"),
aes(x = Year, y = Value, color = "Exports"), linewidth = 1.5) +
# **Data Labels for Exports**
geom_text(data = subset(data_long, `Data Series` == "Exports Of Services"),
aes(x = Year, y = Value, label = round(Value, 1)),
vjust = -1, size = 2.5, color = "red") +
# **Line Chart for Imports**
geom_line(data = subset(data_long, `Data Series` == "Imports Of Services"),
aes(x = Year, y = Value, color = "Imports"), linewidth = 1.5) +
# **Data Labels for Imports**
geom_text(data = subset(data_long, `Data Series` == "Imports Of Services"),
aes(x = Year, y = Value, label = round(Value, 1)),
vjust = 1.5, size = 2.5, color = "blue") +
# **Line Chart for Total Trade (Black, Dashed)**
geom_line(data = total_trade_data,
aes(x = Year, y = Total_Trade, color = "Total Trade"), linewidth = 1.5, linetype = "dashed") +
# **Labels for Total Trade**
geom_text(data = total_trade_data,
aes(x = Year, y = Total_Trade + 20000, label = round(Total_Trade, 1)),
vjust = -0.5, size = 2.5, color = "black") +
# **Bar Chart for Trade Balance (Surplus = Green, Deficit = Red)**
geom_bar(data = trade_balance,
aes(x = Year, y = Services_Trade_Balance,
fill = ifelse(Services_Trade_Balance >= 0, "Surplus", "Deficit")),
stat = "identity", width = 0.5) +
# **Move Labels to the Top of Trade Balance Bars**
geom_text(data = trade_balance,
aes(x = Year, y = Services_Trade_Balance + 5000,
label = round(Services_Trade_Balance, 1)),
vjust = -0.5, size = 2.5, color = "black") +
# **Title & Labels**
labs(title = "Trends in International Trade in Services (2020-2024)",
subtitle = "Exports, Imports & Total Trade as Lines, Trade Balance as Bars",
x = "Year", y = "S$ Billion") +
# **Custom Theme**
theme_minimal(base_size = 10) +
# **Color Customization**
scale_color_manual(name = "Category", values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
scale_fill_manual(name = "Trade Balance", values = c("Surplus" = "green", "Deficit" = "red")) +
# **Formatting**
theme(
legend.position = "bottom",
panel.grid.major.y = element_line(color = "gray", linetype = "dashed"),
axis.text.x = element_text(face = "bold"))
# Display the plot
print(p)
```
:::
### 3.3 Time series analysis
I conducted time-series analysis to analyze trends in exports, imports, and total trade from 2000 to 2024. First, I visualized the historical data using `ggplot2`, ensuring clear differentiation of trends by assigning red for exports, blue for imports, and black for total trade. This provided insights into the overall growth patterns and fluctuations over time.
Key observations:
- **Steady growth:** Singapore’s total trade in services has shown a strong upward trend, reflecting its role as a global trade hub.
- **Exports Lead Imports:** While both have grown, exports consistently exceed imports, indicating a positive trade balance.
- **Impact of Global Events:** Periods of slower growth (e.g., 2008-2012) align with economic disruptions like the Global Financial Crisis.
- **Accelerated Growth After 2018:** Likely driven by Singapore’s push for digital services, financial innovation, and trade agreements.
::: panel-tabset
## Graph()
```{r}
#| echo = FALSE
# Load necessary libraries
library(ggplot2)
library(tidyverse)
library(forecast)
library(plotly)
# Convert data from wide to long format
export_vs_import_long <- export_vs_import %>%
pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
mutate(Year = as.numeric(Year)) %>%
arrange(Year) # Ensure years are sorted in ascending order
# Create separate data frames for each category to ensure proper order
exports_data <- export_vs_import_long %>% filter(`Data Series` == "Exports Of Services")
imports_data <- export_vs_import_long %>% filter(`Data Series` == "Imports Of Services")
total_trade_data <- export_vs_import_long %>% filter(`Data Series` == "Total Trade In Services")
# Convert to time-series format (with correctly sorted values)
exports_ts <- ts(exports_data$Value, start = exports_data$Year[1], frequency = 1)
imports_ts <- ts(imports_data$Value, start = imports_data$Year[1], frequency = 1)
total_trade_ts <- ts(total_trade_data$Value, start = total_trade_data$Year[1], frequency = 1)
# Create Data Frame for Plotting
df <- data.frame(
Year = rep(exports_data$Year, 3),
Value = c(exports_data$Value, imports_data$Value, total_trade_data$Value),
Category = rep(c("Exports", "Imports", "Total Trade"), each = nrow(exports_data))
)
# Create a ggplot with explicit group aesthetic
p <- ggplot(df, aes(x = Year, y = Value, color = Category, group = Category,
text = paste0("Year: ", Year, "<br>Value: ", round(Value, 1)))) +
geom_line(linewidth = 1.2) + # Use linewidth instead of size (ggplot2 3.4.0+)
scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
ggtitle("Export vs Import vs Total Trade Trends") +
ylab("Value (in Millions)") + xlab("Year") +
theme_minimal()
# Convert to interactive plot with tooltips
interactive_plot <- ggplotly(p, tooltip = "text")
# Display interactive plot
interactive_plot
```
## Code()
```{r}
#| eval = FALSE
# Load necessary libraries
library(ggplot2)
library(tidyverse)
library(forecast)
library(plotly)
# Convert data from wide to long format
export_vs_import_long <- export_vs_import %>%
pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
mutate(Year = as.numeric(Year)) %>%
arrange(Year) # Ensure years are sorted in ascending order
# Create separate data frames for each category to ensure proper order
exports_data <- export_vs_import_long %>% filter(`Data Series` == "Exports Of Services")
imports_data <- export_vs_import_long %>% filter(`Data Series` == "Imports Of Services")
total_trade_data <- export_vs_import_long %>% filter(`Data Series` == "Total Trade In Services")
# Convert to time-series format (with correctly sorted values)
exports_ts <- ts(exports_data$Value, start = exports_data$Year[1], frequency = 1)
imports_ts <- ts(imports_data$Value, start = imports_data$Year[1], frequency = 1)
total_trade_ts <- ts(total_trade_data$Value, start = total_trade_data$Year[1], frequency = 1)
# Create Data Frame for Plotting
df <- data.frame(
Year = rep(exports_data$Year, 3),
Value = c(exports_data$Value, imports_data$Value, total_trade_data$Value),
Category = rep(c("Exports", "Imports", "Total Trade"), each = nrow(exports_data))
)
# Create a ggplot with explicit group aesthetic
p <- ggplot(df, aes(x = Year, y = Value, color = Category, group = Category,
text = paste0("Year: ", Year, "<br>Value: ", round(Value, 1)))) +
geom_line(linewidth = 1.2) + # Use linewidth instead of size (ggplot2 3.4.0+)
scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
ggtitle("Export vs Import vs Total Trade Trends") +
ylab("Value (in Millions)") + xlab("Year") +
theme_minimal()
# Convert to interactive plot with tooltips
interactive_plot <- ggplotly(p, tooltip = "text")
# Display interactive plot
interactive_plot
```
:::
### 3.4 Time series forecasting
#### 3.4.1 ETS vs ARIMA model comparison
Based on the model accuracy metrics, ARIMA outperforms ETS in forecasting both exports and imports. ARIMA consistently shows lower RMSE, MAE, and MAPE values, indicating that its predictions are more precise and closer to actual observations. Thus, ARIMA will be the preferred model for time series forecasting.
::: panel-tabset
## Result()
```{r}
#| echo = FALSE
library(forecast)
library(tidyverse)
library(ggplot2)
# Set forecast horizon (e.g., last 5 years)
horizon <- 5
# Ensure there are enough observations for training and test
if (length(exports_ts) <= horizon | length(imports_ts) <= horizon) {
stop("Not enough data points for proper train-test split!")
}
# Split data: Use last `horizon` years as the test set
train_exports <- head(exports_ts, length(exports_ts) - horizon)
test_exports <- tail(exports_ts, horizon)
train_imports <- head(imports_ts, length(imports_ts) - horizon)
test_imports <- tail(imports_ts, horizon)
# Fit ETS models
ets_exports <- ets(train_exports)
ets_imports <- ets(train_imports)
# Fit ARIMA models
arima_exports <- auto.arima(train_exports)
arima_imports <- auto.arima(train_imports)
# 🔹 Generate Forecasts BEFORE extracting forecast values
ets_forecast_exports <- forecast(ets_exports, h = horizon)
ets_forecast_imports <- forecast(ets_imports, h = horizon)
arima_forecast_exports <- forecast(arima_exports, h = horizon)
arima_forecast_imports <- forecast(arima_imports, h = horizon)
# 🔹 Extract the forecasted mean values
ets_forecast_exports_values <- as.numeric(ets_forecast_exports$mean)
ets_forecast_imports_values <- as.numeric(ets_forecast_imports$mean)
arima_forecast_exports_values <- as.numeric(arima_forecast_exports$mean)
arima_forecast_imports_values <- as.numeric(arima_forecast_imports$mean)
# Ensure test sets are numeric
test_exports <- as.numeric(test_exports)
test_imports <- as.numeric(test_imports)
# Ensure test and forecast lengths match
if (length(test_exports) != length(ets_forecast_exports_values)) {
test_exports <- head(test_exports, length(ets_forecast_exports_values))
}
if (length(test_imports) != length(ets_forecast_imports_values)) {
test_imports <- head(test_imports, length(ets_forecast_imports_values))
}
# 🔹 Use accuracy() correctly with the model object
ets_accuracy_exports <- forecast::accuracy(ets_forecast_exports, test_exports)
ets_accuracy_imports <- forecast::accuracy(ets_forecast_imports, test_imports)
arima_accuracy_exports <- forecast::accuracy(arima_forecast_exports, test_exports)
arima_accuracy_imports <- forecast::accuracy(arima_forecast_imports, test_imports)
print(ets_accuracy_exports)
print(arima_accuracy_exports)
# 🔹 Combine accuracy results into a structured dataframe
accuracy_df <- tibble(
Model = rep(c("ETS", "ARIMA"), each = 2),
Type = rep(c("Exports", "Imports"), times = 2),
RMSE = c(ets_accuracy_exports["Test set", "RMSE"],
ets_accuracy_imports["Test set", "RMSE"],
arima_accuracy_exports["Test set", "RMSE"],
arima_accuracy_imports["Test set", "RMSE"]),
MAE = c(ets_accuracy_exports["Test set", "MAE"],
ets_accuracy_imports["Test set", "MAE"],
arima_accuracy_exports["Test set", "MAE"],
arima_accuracy_imports["Test set", "MAE"]),
MAPE = c(ets_accuracy_exports["Test set", "MAPE"],
ets_accuracy_imports["Test set", "MAPE"],
arima_accuracy_exports["Test set", "MAPE"],
arima_accuracy_imports["Test set", "MAPE"])
)
# Print final accuracy dataframe
print(accuracy_df)
```
## Code()
```{r}
#| eval = FALSE
library(forecast)
library(tidyverse)
library(ggplot2)
# Set forecast horizon (e.g., last 5 years)
horizon <- 5
# Ensure there are enough observations for training and test
if (length(exports_ts) <= horizon | length(imports_ts) <= horizon) {
stop("Not enough data points for proper train-test split!")
}
# Split data: Use last `horizon` years as the test set
train_exports <- head(exports_ts, length(exports_ts) - horizon)
test_exports <- tail(exports_ts, horizon)
train_imports <- head(imports_ts, length(imports_ts) - horizon)
test_imports <- tail(imports_ts, horizon)
# Fit ETS models
ets_exports <- ets(train_exports)
ets_imports <- ets(train_imports)
# Fit ARIMA models
arima_exports <- auto.arima(train_exports)
arima_imports <- auto.arima(train_imports)
# 🔹 Generate Forecasts BEFORE extracting forecast values
ets_forecast_exports <- forecast(ets_exports, h = horizon)
ets_forecast_imports <- forecast(ets_imports, h = horizon)
arima_forecast_exports <- forecast(arima_exports, h = horizon)
arima_forecast_imports <- forecast(arima_imports, h = horizon)
# 🔹 Extract the forecasted mean values
ets_forecast_exports_values <- as.numeric(ets_forecast_exports$mean)
ets_forecast_imports_values <- as.numeric(ets_forecast_imports$mean)
arima_forecast_exports_values <- as.numeric(arima_forecast_exports$mean)
arima_forecast_imports_values <- as.numeric(arima_forecast_imports$mean)
# Ensure test sets are numeric
test_exports <- as.numeric(test_exports)
test_imports <- as.numeric(test_imports)
# Ensure test and forecast lengths match
if (length(test_exports) != length(ets_forecast_exports_values)) {
test_exports <- head(test_exports, length(ets_forecast_exports_values))
}
if (length(test_imports) != length(ets_forecast_imports_values)) {
test_imports <- head(test_imports, length(ets_forecast_imports_values))
}
# 🔹 Use accuracy() correctly with the model object
ets_accuracy_exports <- forecast::accuracy(ets_forecast_exports, test_exports)
ets_accuracy_imports <- forecast::accuracy(ets_forecast_imports, test_imports)
arima_accuracy_exports <- forecast::accuracy(arima_forecast_exports, test_exports)
arima_accuracy_imports <- forecast::accuracy(arima_forecast_imports, test_imports)
print(ets_accuracy_exports)
print(arima_accuracy_exports)
# 🔹 Combine accuracy results into a structured dataframe
accuracy_df <- tibble(
Model = rep(c("ETS", "ARIMA"), each = 2),
Type = rep(c("Exports", "Imports"), times = 2),
RMSE = c(ets_accuracy_exports["Test set", "RMSE"],
ets_accuracy_imports["Test set", "RMSE"],
arima_accuracy_exports["Test set", "RMSE"],
arima_accuracy_imports["Test set", "RMSE"]),
MAE = c(ets_accuracy_exports["Test set", "MAE"],
ets_accuracy_imports["Test set", "MAE"],
arima_accuracy_exports["Test set", "MAE"],
arima_accuracy_imports["Test set", "MAE"]),
MAPE = c(ets_accuracy_exports["Test set", "MAPE"],
ets_accuracy_imports["Test set", "MAPE"],
arima_accuracy_exports["Test set", "MAPE"],
arima_accuracy_imports["Test set", "MAPE"])
)
# Print final accuracy dataframe
print(accuracy_df)
```
:::
#### 3.4.2 ARIMA model
Next, I applied **ARIMA modeling** (`auto.arima()`) using the `forecast package` to predict trade values for the next five years (2025-2029).
::: panel-tabset
## ARIMA Forecast for Exports()
```{r}
#| echo = FALSE
library(forecast)
# Apply ARIMA Model
arima_exports <- auto.arima(exports_ts)
arima_imports <- auto.arima(imports_ts)
arima_total_trade <- auto.arima(total_trade_ts)
# Forecast for the next 5 years
arima_forecast_exports <- forecast(arima_exports, h = 5)
arima_forecast_imports <- forecast(arima_imports, h = 5)
arima_forecast_total_trade <- forecast(arima_total_trade, h = 5)
# Plot ARIMA forecasts
autoplot(arima_forecast_exports) + ggtitle("ARIMA Forecast for Exports")
```
## ARIMA Forecast for Imports()
```{r}
#| echo = FALSE
autoplot(arima_forecast_imports) + ggtitle("ARIMA Forecast for Imports")
```
## ARIMA Forecast for Total Trade()
```{r}
#| echo = FALSE
autoplot(arima_forecast_total_trade) + ggtitle("ARIMA Forecast for Total Trade")
```
## Combined Forecast()
```{r}
#| echo = FALSE
# Convert forecasts to data frames for ggplot
df_exports <- as.data.frame(arima_forecast_exports) %>% mutate(Year = seq(max(exports_data$Year) + 1, by = 1, length.out = 5), Category = "Exports")
df_imports <- as.data.frame(arima_forecast_imports) %>% mutate(Year = seq(max(imports_data$Year) + 1, by = 1, length.out = 5), Category = "Imports")
df_total_trade <- as.data.frame(arima_forecast_total_trade) %>% mutate(Year = seq(max(total_trade_data$Year) + 1, by = 1, length.out = 5), Category = "Total Trade")
# Combine all forecasts
df_forecast <- bind_rows(df_exports, df_imports, df_total_trade)
# Create the forecast plot
ggplot(df_forecast, aes(x = Year, y = `Point Forecast`, color = Category)) +
geom_line(linewidth = 1.5) + # Use linewidth instead of size (ggplot2 3.4.0+)
scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
ggtitle("ARIMA Forecast for Exports, Imports, and Total Trade") +
ylab("Forecasted Value (in Millions)") + xlab("Year") +
theme_minimal()
```
## Code()
```{r}
#| eval = FALSE
library(forecast)
# Apply ARIMA Model
arima_exports <- auto.arima(exports_ts)
arima_imports <- auto.arima(imports_ts)
arima_total_trade <- auto.arima(total_trade_ts)
# Forecast for the next 5 years
arima_forecast_exports <- forecast(arima_exports, h = 5)
arima_forecast_imports <- forecast(arima_imports, h = 5)
arima_forecast_total_trade <- forecast(arima_total_trade, h = 5)
# Plot ARIMA forecasts
autoplot(arima_forecast_exports) + ggtitle("ARIMA Forecast for Exports")
autoplot(arima_forecast_imports) + ggtitle("ARIMA Forecast for Imports")
autoplot(arima_forecast_total_trade) + ggtitle("ARIMA Forecast for Total Trade")
# Convert forecasts to data frames for ggplot
df_exports <- as.data.frame(arima_forecast_exports) %>% mutate(Year = seq(max(exports_data$Year) + 1, by = 1, length.out = 5), Category = "Exports")
df_imports <- as.data.frame(arima_forecast_imports) %>% mutate(Year = seq(max(imports_data$Year) + 1, by = 1, length.out = 5), Category = "Imports")
df_total_trade <- as.data.frame(arima_forecast_total_trade) %>% mutate(Year = seq(max(total_trade_data$Year) + 1, by = 1, length.out = 5), Category = "Total Trade")
# Combine all forecasts
df_forecast <- bind_rows(df_exports, df_imports, df_total_trade)
# Create the forecast plot
ggplot(df_forecast, aes(x = Year, y = `Point Forecast`, color = Category)) +
geom_line(linewidth = 1.5) + # Use linewidth instead of size (ggplot2 3.4.0+)
scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
ggtitle("ARIMA Forecast for Exports, Imports, and Total Trade") +
ylab("Forecasted Value (in Millions)") + xlab("Year") +
theme_minimal()
```
:::
#### 3.4.2 Forecast accuracy
We will also be testing the accuracy of the forecast model to evaluate its performance and reliability.
Standard market practices for evaluating forecast accuracy rely on key metrics such as MAPE, RMSE, MAE, and ACF1.
- Based on these measures, the imports forecast is highly accurate (MAPE \~4.84%), while exports and total trade forecasts fall within the "good" range (MAPE \~6.10% and \~5.23%).
- The absence of significant autocorrelation in residuals suggests the models effectively capture trends.
- The high RMSE values, particularly for total trade, indicate potential reliability issues due to large fluctuations in data.
::: panel-tabset
## Forecast accuracy for Exports()
```{r}
#| echo = FALSE
# Ensure test set is numeric
test_exports <- as.numeric(test_exports)
# Ensure test set length matches forecasted values
test_exports <- head(test_exports, length(arima_forecast_exports$mean))
# Compute and print accuracy
cat("### Forecast Accuracy for Exports ###\n")
print(forecast::accuracy(arima_forecast_exports, test_exports))
```
## Forecast accuracy for Imports()
```{r}
#| echo = FALSE
# Ensure test set is numeric
test_imports <- as.numeric(test_imports)
# Ensure test set length matches forecasted values
test_imports <- head(test_imports, length(arima_forecast_imports$mean))
# Print forecast accuracy for imports
cat("\n### Forecast Accuracy for Imports ###\n")
print(forecast::accuracy(arima_forecast_imports, test_imports))
```
## Forecast accuracy for Total Trade()
```{r}
#| echo = FALSE
# Define horizon (forecasting period)
horizon <- 5
# Ensure there are enough observations for training and test
if (length(total_trade_ts) <= horizon) {
stop("Not enough data points for proper train-test split!")
}
# Split total trade data: Use last `horizon` years as the test set
train_total_trade <- head(total_trade_ts, length(total_trade_ts) - horizon)
test_total_trade <- tail(total_trade_ts, horizon)
# Ensure test set is numeric
test_total_trade <- as.numeric(test_total_trade)
# Ensure test set length matches forecasted values
test_total_trade <- head(test_total_trade, length(arima_forecast_total_trade$mean))
# Print forecast accuracy for total trade
cat("\n### Forecast Accuracy for Total Trade ###\n")
print(forecast::accuracy(arima_forecast_total_trade, test_total_trade))
```
## Code()
```{r}
#| eval = FALSE
# Compute and print accuracy
cat("### Forecast Accuracy for Exports ###\n")
print(forecast::accuracy(arima_forecast_exports, test_exports))
# Print forecast accuracy for imports
cat("\n### Forecast Accuracy for Imports ###\n")
print(forecast::accuracy(arima_forecast_imports, test_imports))
# Print forecast accuracy for total trade
cat("\n### Forecast Accuracy for Total Trade ###\n")
print(forecast::accuracy(arima_forecast_total_trade, test_total_trade))
```
:::
## 4. DataVis makeover 2
### 4.1 Makeover of - Exports/ Imports of Services by Services Category
#### 4.1.1 Original visualization
The original visualizations analyzed in this analysis are sourced from [here](https://www.singstat.gov.sg/modules/infographics/singapore-international-trade).
{width="506"}
{width="510"}
#### 4.1.2 Evaluation of the original visualization
In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) **clarity**, and (2) **visual appeal**:
- **Clarity**: How well the data is presented and understood
| Pros | Cons | Suggested fixes |
|------------------------|------------------------|------------------------|
| **Clearly labeled service categories with values and percentages** – Each category is distinctly labeled, making it easy to understand the contributions of each service type. | **Pie charts are not effective for comparing data slices** – When multiple categories have close values, it is difficult to differentiate them accurately. | **Replace pie charts with bar charts or grouped bar charts -** for clearer comparison of service categories. |
| **Percentage values enhance clarity** – Helps users quickly understand each category’s contribution to total exports/ imports. | **Pie charts are only useful when there are limited categories** – Too many slices make it difficult to interpret, as smaller sections become unreadable. | **Replace with a bar chart or grouped bar chart** to improve readability. |
| | **The pie chart format makes trend comparison difficult** – Pie charts only represent a single year and do not show how exports have changed over time. | **Add a trend chart (e.g., line graph or bar chart)** – Including historical data can help identify patterns and trends over multiple years. |
- **Visual appeal**: How visually engaging and effective the design is
| Pros | Cons | Suggested fixes |
|------------------------|------------------------|------------------------|
| **Good use of icons to represent different services** – This makes the chart more engaging and helps with intuitive understanding. | **Pie charts distort perception** – it's hard to accurately compare slice sizes, especially when they are similar. | **Use a** **bar chart, grouped bar chart or treemaps**, which allows for more proportional representation of each service category. |
| **Well-structured layout with categories spread around the chart -** The service categories for both exports and imports maintain a similar placement | **Pie chart format is too cluttered** – The large number of slices makes it difficult to read smaller segments and their labels. | **Use a** **bar chart, grouped bar chart or treemaps** for better spacing and spatial representation. |
| | Too many elements are packed into the chart, making it overwhelming. | Simplify by **removing non-essential elements -** i.e: icons |
| | **No legend for color segmentation** – The chart uses various colors for different service categories, but there is no clear legend explaining their grouping or significance. | **Include a legend or categorize colors meaningfully** – Assign gradient based color scale to visually emphasize trade volume intensity. |
::: callout-tip
## Limitations of pie charts in data visualization
- **Distorted perception:** Pie chart makes comparisons difficult due to the reliance on angles and areas rather than a common baseline.
- **Difficult to compare similar data slices:** Can be misleading when there are many segments/ similar-sized portions making it hard to interpret differences accurately.
- **Space constraints:** Pie chart can take up more space than necessary and can clutter dashboards or reports.
- **Poor for trend analysis:** Pie charts only show a single point in time and do not help in comparing trends over multiple years.
:::
::: callout-note
## Why are the use of pie charts be frown upon in data visualization?
- Refer to this [page](https://bernardmarr.com/why-you-shouldnt-use-pie-charts-in-your-dashboards-and-performance-reports/#:~:text=From%20a%20design%20point%20of,data%20more%20complicated%20than%20before.) to find out why the use of pie charts are discouraged.
:::
### 4.2 Makeover of the original data visualization
#### 4.2.1 Data preparation and filtering for relevant variables
After importing the `trade_services` data set, we will filter for the key categories that contribute to the exports/ imports of services in Singapore, excluding subcategories. The 12 major categories are identified based on indentation, as shown below:
| Data Series | Shortened label |
|------------------------------------|------------------------------------|
| Manufacturing Services On Physical Inputs Owned By Others | Manuf. Services |
| Maintenance And Repair Services | Maintenance & Repair |
| Transport | Transport |
| Travel | Travel |
| Insurance | Insurance |
| Government Goods And Services | Govt. Services |
| Construction | Construction |
| Financial | Financial Services |
| Telecommunications, Computer & Information | Telecom & IT |
| Charges For The Use Of Intellectual Property | Intellectual Property |
| Personal, Cultural And Recreational | Cultural & Recreational |
| Other Business Services | Other Biz Services |
We first clean the "Data Series" column by removing leading and trailing spaces to ensure consistency. Next, we identify the positions of "Exports Of Services" and "Imports Of Services" in the dataset, extracting only the rows between these markers for exports and those following the imports marker for imports. We then filter both datasets to retain only the 12 major service categories as hghlighted above.
Finally, we combine the cleaned exports and imports data into a single structured table, adding a "Trade Type" column to differentiate between imports/ exports.
```{r}
# Trim leading/trailing spaces from 'Data Series' column
trade_services$`Data Series` <- trimws(trade_services$`Data Series`)
# Identify row indices for "Exports Of Services" and "Imports Of Services"
export_start_idx <- which(trade_services$`Data Series` == "Exports Of Services")
import_start_idx <- which(trade_services$`Data Series` == "Imports Of Services")
# Extract rows between "Exports Of Services" and "Imports Of Services"
exports_df <- trade_services[(export_start_idx + 1):(import_start_idx - 1), ]
imports_df <- trade_services[(import_start_idx + 1):nrow(trade_services), ]
# Define the 12 major categories
major_categories <- c(
"Manufacturing Services On Physical Inputs Owned By Others",
"Maintenance And Repair Services",
"Transport",
"Travel",
"Insurance",
"Government Goods And Services",
"Construction",
"Financial",
"Telecommunications, Computer & Information",
"Charges For The Use Of Intellectual Property",
"Personal, Cultural And Recreational",
"Other Business Services"
)
# Filter only the major categories for exports and imports
exports_major <- exports_df %>% filter(`Data Series` %in% major_categories)
imports_major <- imports_df %>% filter(`Data Series` %in% major_categories)
# Add a column to indicate whether it's exports or imports
exports_major$`Trade Type` <- "Exports"
imports_major$`Trade Type` <- "Imports"
# Combine into a single dataframe
final_trade_df <- bind_rows(exports_major, imports_major)
# Reorder columns: Move "Trade Type" to the second column
final_trade_df <- final_trade_df %>% select(`Data Series`, `Trade Type`, everything())
# View the final structured data
print(final_trade_df)
```
#### 4.2.2 Revised visualization - bar chart
Key makeover changes:
1️⃣ **Single chart for imports and exports:**
- **Combined visualization** –\> Instead of two separate pie charts, the new bar chart combines both, allowing for side-by-side comparison.
2️⃣ **Improved readability and comparability:**
- **Grouped bars (red for exports, blue for imports)** –\> Offer a clearer visual distinction between trade types.
- **Sorted categories (largest to smallest)** –\> Focus attention on key contributors, ensuring the most impactful sectors are easily identified.
3️⃣ **Simplified layout:**
- **Removed unnecessary graphics** **(i.e.: icons, maps)** –\>Decluttered the chart to emphasize trade data information.
4️⃣ **Interactive insights:**
- **Hover tooltips** –\> Display category names, trade values, and percentages shares, making the chart more dynamic and user-friendly.
::: callout-note
## Overall
- Better clarity, accuracy, and direct comparison in a single view.
- Eliminates redundancy and improves clarity for faster data interpretation.
:::
::: panel-tabset
## Enhance graph()
```{r}
#| echo = FALSE
# Load required libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(plotly)
# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
select(`Data Series`, `Trade Type`, `2024`) %>%
rename(Value = `2024`)
# Shorten long category labels for better readability
trade_2024$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Financial", "Financial Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Other Business Services", "Other Biz Services", trade_2024$`Data Series`)
# Compute Total Exports & Imports for Percentage Calculation
total_exports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Exports"], na.rm = TRUE)
total_imports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Imports"], na.rm = TRUE)
# Add Percentage Column
trade_2024 <- trade_2024 %>%
mutate(Percentage = ifelse(`Trade Type` == "Exports",
Value / total_exports * 100,
Value / total_imports * 100))
# Sum values for sorting (descending order)
category_order <- trade_2024 %>%
group_by(`Data Series`) %>%
summarise(Total_Trade = sum(Value, na.rm = TRUE)) %>%
arrange(desc(Total_Trade)) %>%
pull(`Data Series`)
# Convert `Data Series` to factor for correct sorting
trade_2024$`Data Series` <- factor(trade_2024$`Data Series`, levels = category_order)
# Define custom colors for Exports (Red) and Imports (Blue)
custom_colors <- c("Exports" = "red", "Imports" = "blue")
# Create the ggplot object
gg <- ggplot(trade_2024, aes(x = `Data Series`, y = Value, fill = `Trade Type`,
text = paste("Category:", `Data Series`,
"<br>Trade Type:", `Trade Type`,
"<br>Value: S$", scales::comma(Value),
"<br>Share:", round(Percentage, 1), "%"))) +
geom_bar(stat = "identity", position = position_dodge(width = 0.8)) + # Side-by-side bars
scale_fill_manual(values = custom_colors) + # Apply custom colors
scale_y_continuous(labels = scales::comma) + # Format y-axis labels
labs(title = "Singapore Trade in Services - 2024",
subtitle = "Side-by-Side Comparison of Imports and Exports for 12 Service Categories",
x = "Service Category",
y = "Trade Value",
fill = "Trade Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x-axis labels
legend.position = "bottom")
# Convert to interactive plot with tooltips
ggplotly(gg, tooltip = "text")
```
## Code()
```{r}
#| eval = FALSE
# Load required libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(plotly) # For interactive tooltips
# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
select(`Data Series`, `Trade Type`, `2024`) %>%
rename(Value = `2024`)
# Shorten long category labels for better readability
trade_2024$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Financial", "Financial Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Other Business Services", "Other Biz Services", trade_2024$`Data Series`)
# Compute Total Exports & Imports for Percentage Calculation
total_exports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Exports"], na.rm = TRUE)
total_imports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Imports"], na.rm = TRUE)
# Add Percentage Column
trade_2024 <- trade_2024 %>%
mutate(Percentage = ifelse(`Trade Type` == "Exports",
Value / total_exports * 100,
Value / total_imports * 100))
# Sum values for sorting (descending order)
category_order <- trade_2024 %>%
group_by(`Data Series`) %>%
summarise(Total_Trade = sum(Value, na.rm = TRUE)) %>%
arrange(desc(Total_Trade)) %>%
pull(`Data Series`)
# Convert `Data Series` to factor for correct sorting
trade_2024$`Data Series` <- factor(trade_2024$`Data Series`, levels = category_order)
# Define custom colors for Exports (Red) and Imports (Blue)
custom_colors <- c("Exports" = "red", "Imports" = "blue")
# Create the ggplot object
gg <- ggplot(trade_2024, aes(x = `Data Series`, y = Value, fill = `Trade Type`,
text = paste("Category:", `Data Series`,
"<br>Trade Type:", `Trade Type`,
"<br>Value: S$", scales::comma(Value),
"<br>Share:", round(Percentage, 1), "%"))) +
geom_bar(stat = "identity", position = position_dodge(width = 0.8)) + # Side-by-side bars
scale_fill_manual(values = custom_colors) + # Apply custom colors
scale_y_continuous(labels = scales::comma) + # Format y-axis labels
labs(title = "Singapore Trade in Services - 2024",
subtitle = "Side-by-Side Comparison of Imports and Exports for 12 Service Categories",
x = "Service Category",
y = "Trade Value",
fill = "Trade Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x-axis labels
legend.position = "bottom")
# Convert to interactive plot with tooltips
ggplotly(gg, tooltip = "text")
```
:::
#### 4.2.3 Revised visualization - Treemap
Key makeover changes:
1️⃣ **Transition from Pie charts to Treemaps**
- **Pie charts are replaced with treemaps** –\> Treemaps uses area size to provide a better spatial comparison instead of circular slices in pie charts
2️⃣ **Added plotly inteactive features:**
- **Enabled interactive treempas using plotly** –\> Allows for hovering over categories to explore trade value more dynamically.
3️⃣ **Gradient-based color scale for trade value:**
- **Applied green gradient (darker green = higher value, lighter green = lower value)** –\> Visually emphasizes trade volume intensity instead of random colors in pie charts.
::: panel-tabset
```{r, echo=FALSE, message=FALSE, warning=FALSE, results="hide"}
# Load required libraries
library(dplyr)
library(tidyr)
library(treemap)
library(plotly)
# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
select(`Data Series`, `Trade Type`, `2024`) %>%
rename(Value = `2024`)
# Shorten long category labels for better readability
trade_2024 <- trade_2024 %>%
mutate(Short_Label = case_when(
`Data Series` == "Manufacturing Services On Physical Inputs Owned By Others" ~ "Manuf. Services",
`Data Series` == "Maintenance And Repair Services" ~ "Maintenance & Repair",
`Data Series` == "Government Goods And Services" ~ "Govt. Services",
`Data Series` == "Financial" ~ "Financial Services",
`Data Series` == "Telecommunications, Computer & Information" ~ "Telecom & IT",
`Data Series` == "Charges For The Use Of Intellectual Property" ~ "Intellectual Property",
`Data Series` == "Personal, Cultural And Recreational" ~ "Cultural & Recreational",
`Data Series` == "Other Business Services" ~ "Other Biz Services",
TRUE ~ `Data Series`
))
# Split the dataset into Exports and Imports
trade_exports <- trade_2024 %>% filter(`Trade Type` == "Exports")
trade_imports <- trade_2024 %>% filter(`Trade Type` == "Imports")
# Create Treemap Data for Exports
treemap_exports <- treemap(trade_exports,
index = "Short_Label",
vSize = "Value",
type = "index",
palette = "Blues",
title = "Exports Treemap - 2024",
draw = FALSE)
# Create Treemap Data for Imports
treemap_imports <- treemap(trade_imports,
index = "Short_Label",
vSize = "Value",
type = "index",
palette = "Oranges",
title = "Imports Treemap - 2024",
draw = FALSE)
```
## Enhance graph()
```{r}
#| echo = FALSE
# Rename the index column correctly for both treemaps
colnames(treemap_exports$tm)[colnames(treemap_exports$tm) == "index"] <- "Short_Label"
colnames(treemap_imports$tm)[colnames(treemap_imports$tm) == "index"] <- "Short_Label"
# Convert Exports Treemap Data to Plotly (Darker Green for Higher Values)
p1_interactive <- plot_ly(
data = treemap_exports$tm,
labels = ~Short_Label,
parents = "",
values = ~vSize,
text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
type = "treemap",
textinfo = "label+text",
marker = list(
colorscale = list(c(0, 1), c("#d9f2d9", "#006400")), # Light to Dark Green
cmin = min(treemap_exports$tm$vSize),
cmax = max(treemap_exports$tm$vSize),
colorbar = list(title = "Trade Value")
)
) %>%
layout(title = "Exports Treemap - 2024")
# Convert Imports Treemap Data to Plotly (Darker Green for Higher Values)
p2_interactive <- plot_ly(
data = treemap_imports$tm,
labels = ~Short_Label,
parents = "",
values = ~vSize,
text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
type = "treemap",
textinfo = "label+text",
marker = list(
colorscale = list(c(0, 1), c("#d9f2d9", "#006400")), # Light to Dark Green
cmin = min(treemap_imports$tm$vSize),
cmax = max(treemap_imports$tm$vSize),
colorbar = list(title = "Trade Value")
)
) %>%
layout(title = "Imports Treemap - 2024")
p1_interactive
p2_interactive
```
## Code()
```{r}
#| eval = FALSE
# Load required libraries
library(dplyr)
library(tidyr)
library(treemap)
library(plotly)
# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
select(`Data Series`, `Trade Type`, `2024`) %>%
rename(Value = `2024`)
# Shorten long category labels for better readability
trade_2024 <- trade_2024 %>%
mutate(Short_Label = case_when(
`Data Series` == "Manufacturing Services On Physical Inputs Owned By Others" ~ "Manuf. Services",
`Data Series` == "Maintenance And Repair Services" ~ "Maintenance & Repair",
`Data Series` == "Government Goods And Services" ~ "Govt. Services",
`Data Series` == "Financial" ~ "Financial Services",
`Data Series` == "Telecommunications, Computer & Information" ~ "Telecom & IT",
`Data Series` == "Charges For The Use Of Intellectual Property" ~ "Intellectual Property",
`Data Series` == "Personal, Cultural And Recreational" ~ "Cultural & Recreational",
`Data Series` == "Other Business Services" ~ "Other Biz Services",
TRUE ~ `Data Series`
))
# Split the dataset into Exports and Imports
trade_exports <- trade_2024 %>% filter(`Trade Type` == "Exports")
trade_imports <- trade_2024 %>% filter(`Trade Type` == "Imports")
# Create Treemap Data for Exports
treemap_exports <- treemap(trade_exports,
index = "Short_Label",
vSize = "Value",
type = "index",
palette = "Blues",
title = "Exports Treemap - 2024",
draw = FALSE)
# Create Treemap Data for Imports
treemap_imports <- treemap(trade_imports,
index = "Short_Label",
vSize = "Value",
type = "index",
palette = "Oranges",
title = "Imports Treemap - 2024",
draw = FALSE)
# Convert Exports Treemap Data to Plotly (Darker Green for Higher Values)
p1_interactive <- plot_ly(
data = treemap_exports$tm,
labels = ~Short_Label,
parents = "",
values = ~vSize,
text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
type = "treemap",
textinfo = "label+text",
marker = list(
colorscale = list(c(0, 1), c("#d9f2d9", "#006400")), # Light to Dark Green
cmin = min(treemap_exports$tm$vSize),
cmax = max(treemap_exports$tm$vSize),
colorbar = list(title = "Trade Value")
)
) %>%
layout(title = "Exports Treemap - 2024")
# Convert Imports Treemap Data to Plotly (Darker Green for Higher Values)
p2_interactive <- plot_ly(
data = treemap_imports$tm,
labels = ~Short_Label,
parents = "",
values = ~vSize,
text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
type = "treemap",
textinfo = "label+text",
marker = list(
colorscale = list(c(0, 1), c("#d9f2d9", "#006400")), # Light to Dark Green
cmin = min(treemap_imports$tm$vSize),
cmax = max(treemap_imports$tm$vSize),
colorbar = list(title = "Trade Value")
)
) %>%
layout(title = "Imports Treemap - 2024")
p1_interactive
p2_interactive
```
:::
### 4.3 Time series analysis
#### 4.3.1 Trade in Services (Exports) - 2000 to 2012 to 2024
Key observations:
- **Rapid Growth:** Singapore’s service exports have grown significantly, reinforcing its role as a global trade hub.
- **Leading Sectors**: Transport and Other Business Services dominate, showing the steepest growth.
- **Financial & Digital Expansion: Financial Services, Telecom & IT, and Intellectual Property** have surged, driven by digitalization and economic policies.
- **Resilience & Acceleration:** Despite slower growth around 2012 (likely due to economic shocks), post-2018 expansion highlights policy-driven trade growth.
::: callout-note
- Singapore's services exports continue to thrive, reflecting its strong economic positioning
:::
::: panel-tabset
## Time series()
```{r, echo=FALSE, fig.width=8, fig.height=6}
# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)
# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))
# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)
# Filter for Exports only
exports_df <- final_trade_df %>% filter(`Trade Type` == "Exports")
# Select relevant columns: Data Series, 2000, 2012, 2024
exports_slope <- exports_df %>%
select(`Data Series`, `2000`, `2012`, `2024`) %>%
drop_na()
# Keep only the top 10 services in 2024 for clarity
exports_slope <- exports_slope %>% arrange(desc(`2024`)) %>% head(10)
# Convert data to long format for `newggslopegraph()`
exports_long <- exports_slope %>%
pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")
# Convert Year to character (Fix for newggslopegraph)
exports_long$Year <- as.character(exports_long$Year)
# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)
# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = exports_long,
Times = Year,
Measurement = Value,
Grouping = `Data Series`,
Title = "Trade in Services (Exports) - Slopegraph (2000-2012-2024)",
SubTitle = "Top 10 Services by Export Value",
LineThickness = 1.0,
DataTextSize = 2.5)
```
## Code()
```{r}
#| eval = FALSE
#| fig-width: 8
#| fig-height: 6
# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)
# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))
# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)
# Filter for Exports only
exports_df <- final_trade_df %>% filter(`Trade Type` == "Exports")
# Select relevant columns: Data Series, 2000, 2012, 2024
exports_slope <- exports_df %>%
select(`Data Series`, `2000`, `2012`, `2024`) %>%
drop_na()
# Keep only the top 10 services in 2024 for clarity
exports_slope <- exports_slope %>% arrange(desc(`2024`)) %>% head(10)
# Convert data to long format for `newggslopegraph()`
exports_long <- exports_slope %>%
pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")
# Convert Year to character (Fix for newggslopegraph)
exports_long$Year <- as.character(exports_long$Year)
# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)
# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = exports_long,
Times = Year,
Measurement = Value,
Grouping = `Data Series`,
Title = "Trade in Services (Exports) - Slopegraph (2000-2012-2024)",
SubTitle = "Top 10 Services by Export Value",
LineThickness = 1.0,
DataTextSize = 2.5)
```
:::
#### 4.3.2 Trade in Services (Imports) - 2000 to 2012 to 2024
Key observations:
- **Strong Growth in Imports:** Similar to exports, Singapore’s service imports have grown significantly, reflecting its global trade connectivity.
- **Transport & Other Business Services Lead:** These two sectors dominate imports, mirroring export trends, but at slightly lower values.
- **Rising Financial Services & Telecom & IT:** These sectors have seen notable import growth, indicating increasing reliance on foreign expertise and digital services.
- **Higher Import Dependency in Intellectual Property:** Compared to exports, imports in Intellectual Property have increased faster, suggesting rising licensing and royalty payments.
::: callout-note
- Singapore’s service imports align with its export-driven economy, with strong growth in digital services, finance, and transport.
:::
::: panel-tabset
## Time series()
```{r, echo=FALSE, fig.width=8, fig.height=6}
# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)
# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))
# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)
# Filter for Imports only
imports_df <- final_trade_df %>% filter(`Trade Type` == "Imports")
# Select relevant columns: Data Series, 2000, 2012, 2024
imports_slope <- imports_df %>%
select(`Data Series`, `2000`, `2012`, `2024`) %>%
drop_na()
# Keep only the top 10 services in 2024 for clarity
imports_slope <- imports_slope %>% arrange(desc(`2024`)) %>% head(10)
# Convert data to long format for `newggslopegraph()`
imports_long <- imports_slope %>%
pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")
# Convert Year to character (Fix for newggslopegraph)
imports_long$Year <- as.character(imports_long$Year)
# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)
# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = imports_long,
Times = Year,
Measurement = Value,
Grouping = `Data Series`,
Title = "Trade in Services (Imports) - Slopegraph (2000-2012-2024)",
SubTitle = "Top 10 Services by Import Value",
LineThickness = 1.0,
DataTextSize = 2.5)
```
## Code()
```{r}
#| eval = FALSE
#| fig-width: 8
#| fig-height: 6
# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)
# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))
# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)
# Filter for Imports only
imports_df <- final_trade_df %>% filter(`Trade Type` == "Imports")
# Select relevant columns: Data Series, 2000, 2012, 2024
imports_slope <- imports_df %>%
select(`Data Series`, `2000`, `2012`, `2024`) %>%
drop_na()
# Keep only the top 10 services in 2024 for clarity
imports_slope <- imports_slope %>% arrange(desc(`2024`)) %>% head(10)
# Convert data to long format for `newggslopegraph()`
imports_long <- imports_slope %>%
pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")
# Convert Year to character (Fix for newggslopegraph)
imports_long$Year <- as.character(imports_long$Year)
# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)
# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = imports_long,
Times = Year,
Measurement = Value,
Grouping = `Data Series`,
Title = "Trade in Services (Imports) - Slopegraph (2000-2012-2024)",
SubTitle = "Top 10 Services by Import Value",
LineThickness = 1.0,
DataTextSize = 2.5)
```
:::
## 5. DataVis makeover 3
### 5.1 Makeover of - Major Trading Partners for Trade in Services, 2023
#### 5.1.1 Original visualization
The original visualizations analyzed in this analysis are sourced from [here](https://www.singstat.gov.sg/modules/infographics/singapore-international-trade).
{width="523"}
{width="523"}
#### 5.1.2 Evaluation of the original visualization
In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) **clarity**, and (2) **visual appeal**:
- **Clarity**: How well the data is presented and understood
| Pros | Cons | Suggested fixes |
|------------------------|------------------------|------------------------|
| **Clear regional differentiation** - The color-coded regions (North America, Europe, Asia, Oceania) make it easier to distinguish areas. | **Color scheme inconsistency** - The colors used for regions in the world map do not match the second visualization, leading to confusion. | **Ensure consistent colors** across both graphics (e.g., North America should be red in both images). |
| **Easy country identification -** The use of country flags provides quick recognition of major trading partners on the world map. | **Lack of legend for flags** - It is unclear whether the flags represent top import/export partners or something else. | **Include a small legend** explaining the significance of the flags. |
| **Well-placed title** - The main title is clear and ensures users immediately understand the topic. | **World map does not indicate rankings -** It is difficult to determine the largest trading partners. | **Add ranking indicators -** e.g.: numbered markers, or a gradient effect on the map. |
| | **Need to scroll to see rankings -** Users must scroll down to understand that U.S is the top trading partner, followed by EU, and China. | **Overlay rankings on the world map** so users can see top trading partners at a glance. |
| | **Lack of clear distinction between Regions, Countries, and Economic & Political Unions** - The visualization presents them together, making it difficult to see how rankings differ. | **Introduce clear labels or grouping** for Regions, Countries, and Economic & Political Unions in both the world map and bar chart. |
- **Visual appeal**: How visually engaging and effective the design is
| Pros | Cons | Suggested fixes |
|------------------------|------------------------|------------------------|
| **Engaging world map design** - The dotted world map with vibrant colors creates a modern and appealing look. | **Color inconsistency between the two visuals** - North America is red in the map but blue in the bar chart. | **Use a unified color scheme** so regions in both visuals match. |
### 5.2 Makeover of the original data visualization
#### 5.2.1 Data wrangling & data preparation
##### 5.2.1.1 Importing the data
The code chunk below imports the (1) [Exports of Services by Major Trading Partner datasets](https://tablebuilder.singstat.gov.sg/table/TS/M060201), and (2) [Imports of Services by Major Trading Partner datasets](https://tablebuilder.singstat.gov.sg/table/TS/M060211#!) downloaded from the [Department of Statistics Singapore (DOS)](https://www.singstat.gov.sg/), using the `read_excel()` function from the `readxl` package. These datasets contain trade value of exports and imports of services by major trading partner across various 2000 to 2023, which will be processed and analyzed in subsequent steps.
```{r}
export_partners <- read_excel("data/Exports Of Services By Major Trading Partner_base.xlsx")
```
```{r}
import_partners <- read_excel("data/Imports Of Services By Major Trading Partner_base.xlsx")
```
##### 5.2.1.2 Understanding the `export_partners` data structure
- `glimpse()`: provides a transposed overview of a dataset, showing variables and their types in a concise format.
- `head()`: displays the first few rows of a dataset (default is 6 rows) to give a quick preview of the data.
- `summary()`: generates a statistical summary of each variable, including measures like mean, median, and range for numeric data.
- `duplicated()`:returns a logical vector indicating which elements or rows in a vector or data frame are duplicates.
- `colSums(is.na())`: counts the number of missing values (NA) in each column of the data frame.
- `str()`: use `str()` to display the column names, data types, and a preview of the data.
:::::: panel-tabset
## glimpse()
```{r}
glimpse(export_partners)
```
## head()
```{r}
head(export_partners)
```
## summary()
```{r}
summary(export_partners)
```
## duplicated()
```{r}
export_partners[duplicated(export_partners),]
```
::: callout-note
- Ensure that there are no duplicated columns, if not will have to investigate further.
:::
## colSum(is.na())
```{r}
colSums(is.na(export_partners))
```
::: callout-note
- Ensure that there are no NA values, if not will have to investigate further.
- Possibility to use the `replace_na()` function to replace missing values with 0 in specified columns.
:::
## str())
```{r}
str(export_partners)
```
::: callout-note
- Ensure that all variables are correctly classified by data type; recast variable types if needed.
- Variables are correctly classified - where categorical variables are classified as **character**, while continuous variables are classified as **double**.
:::
::::::
##### 5.2.1.3 Understanding the `import_partners` data structure
:::::: panel-tabset
## glimpse()
```{r}
glimpse(import_partners)
```
## head()
```{r}
head(import_partners)
```
## summary()
```{r}
summary(import_partners)
```
## duplicated()
```{r}
import_partners[duplicated(import_partners),]
```
::: callout-note
- Ensure that there are no duplicated columns, if not will have to investigate further.
:::
## colSum(is.na())
```{r}
colSums(is.na(import_partners))
```
::: callout-note
- Ensure that there are no NA values, if not will have to investigate further.
- Possibility to use the `replace_na()` function to replace missing values with 0 in specified columns.
:::
## str())
```{r}
str(import_partners)
```
::: callout-note
- Ensure that all variables are correctly classified by data type; recast variable types if needed.
- Variables are correctly classified - where categorical variables are classified as **character**, while continuous variables are classified as **double**.
:::
::::::
##### 5.2.1.4 Combining the `export_partners` and `import_partners`
To ensure consistency, column names were converted to lowercase, and spaces were replaced with underscores. Since numerical values were stored as different data types in the datasets, all year columns (2000-2023) were converted to numeric format to prevent data type mismatches. Additionally, any missing (NA) values or dashes ("-") were replaced with 0 to ensure data integrity. A "category" column was added to distinguish between Exports and Imports. Finally, the two datasets were combined using `bind_rows()`.
::: callout-important
- Year columns (2000-2023) in both Exports and Imports dataset were converted to numeric format to ensure consistency and prevent data type mismatches.
- There are missing (NA) values and "-" characters in the dataset.
- Replace NA values and "-" with 0 using the mutate() and replace() functions.
:::
```{r}
# Ensure column names are consistent
colnames(export_partners) <- tolower(gsub(" ", "_", colnames(export_partners)))
colnames(import_partners) <- tolower(gsub(" ", "_", colnames(import_partners)))
# Convert all numeric year columns to numeric type safely by handling non-numeric values and replacing NA and "-" with 0
export_partners <- export_partners %>% mutate(across(matches("^\\d{4}$"), ~suppressWarnings(as.numeric(.)))) %>%
mutate(across(matches("^\\d{4}$"), ~replace(., is.na(.) | . == "-", 0)))
import_partners <- import_partners %>% mutate(across(matches("^\\d{4}$"), ~suppressWarnings(as.numeric(.)))) %>%
mutate(across(matches("^\\d{4}$"), ~replace(., is.na(.) | . == "-", 0)))
# Add a column to indicate the category (Exports, Imports)
export_partners <- export_partners %>% mutate(category = "Exports")
import_partners <- import_partners %>% mutate(category = "Imports")
# Combine both datasets
combined_data <- bind_rows(export_partners, import_partners)
```
##### 5.2.1.5 Understanding the `combined_data` data structure
The `combined_data` tibble contains 26 attributes, as shown below.
The following preprocessing checks were conducted as part of data preparation:
::: callout-tip
## Preprocessing Checks
- Verified that the correct data types were loaded in the `combined_data` dataset using `glimpse()` and `str()`
- Ensured there were no duplicate variable names using `duplicated()` in the dataset
- Checked for missing values using `colSums(is.na())`
:::
:::::: panel-tabset
## glimpse()
```{r}
glimpse(combined_data)
```
## head()
```{r}
head(combined_data)
```
## summary()
```{r}
summary(combined_data)
```
## duplicated()
```{r}
combined_data[duplicated(combined_data),]
```
::: callout-note
- Ensure that there are no duplicated columns, if not will have to investigate further.
:::
## colSum(is.na())
```{r}
colSums(is.na(combined_data))
```
::: callout-note
- Ensure that there are no NA values, if not will have to investigate further.
- Possibility to use the `replace_na()` function to replace missing values with 0 in specified columns.
:::
## str())
```{r}
str(combined_data)
```
::: callout-note
- Ensure that all variables are correctly classified by data type; recast variable types if needed.
- Variables are correctly classified - where categorical variables are classified as **character**, while continuous variables are classified as **double**.
:::
::::::
##### 5.2.1.6 Categorizing trading partners
A new "category" column will be added to classify each trading partner as a **Country**, **Region**, or **Economic & Political Union**. This helps organize the data, making it easier to analyze trade patterns across different entity types.
| types | data_series |
|------------------------------------|------------------------------------|
| Country | Bangladesh, Brunei Darussalam, Cambodia, Hong Kong, India, Indonesia, Israel, Japan, Kuwait, Mainland China, Malaysia, Myanmar, Pakistan, Philippines, Qatar, Republic Of Korea, Saudi Arabia, Sri Lanka, Taiwan Thailand,Turkiye United Arab Emirates, Vietnam, Belgium, Cyprus Denmark, Finland France Germany, Greece Ireland Italy, Luxembourg, Netherlands, Norway, Portugal, Russian Federation, Spain, Sweden, Switzerland, United Kingdom, United States Of America, Canada, Australia, Marshall Islands, New Zealand, Papua New Guinea, Bermuda, Brazil, British Virgin Islands, Cayman Islands, Chile, Mexico Panama, Peru, Egypt, Liberia, Mauritius, Nigeria South Africa |
| Region | Asia, Europe, North America, Oceania, South And Central America And The Caribbean, Africa |
| Economic & Political Union | ASEAN, European Union (EU-27) |
```{r}
# Load the categorization mapping from Excel
categorization_mapping <- read_excel("data/categorizationmapping.xlsx")
# Ensure column names are consistent
colnames(categorization_mapping) <- tolower(gsub(" ", "_", colnames(categorization_mapping)))
# Merge only the type into combined_data
combined_data <- combined_data %>% left_join(categorization_mapping, by = "data_series")
```
##### 5.2.1.7 Aggregating the trade values across the years
To calculate the total trade values for each trading partner, the dataset was grouped by "data_series", and trade values from 2000 to 2023 were summed using `summarise()`, ensuring missing values were ignored with `na.rm = TRUE`. A "category" column was assigned the value "Total" to indicate aggregated trade figures. The computed totals were then appended to the original dataset using `bind_rows()`.
```{r}
# Summing up the values for each trading partner across all years (2000-2023)
summed_data <- combined_data %>%
group_by(data_series, types) %>%
summarise(across(matches("^\\d{4}$"), sum, na.rm = TRUE), .groups = "drop") %>%
mutate(category = "Total")
# Add the total category to the combined dataset
final_data <- bind_rows(combined_data, summed_data)
# Reorder columns to move category to the second position
final_data <- final_data %>% select(data_series, category, types, everything())
# Display a preview of the merged data
print(head(final_data))
# library(openxlsx)
#
# write.xlsx(final_data, "final_data.xlsx")
```
#### 5.2.2 Revised visualization - pie chart \[region\]
Key makeover changes:
1️⃣ **Breaking down trade data by types - e.g.: countries/ regions/ economic & political union**
- Original visualization grouped countries, regions, economic & political unions making trade relationships harder to interpret.
- Revised visualization will categorize data into different types, ensuring clarity.
2️⃣ **Pie chart for region-level aggregation:**
- **Pie chart groups trade volumes by region** –\>Provide a high-level trade distribution which allow users to quickly assess which regions contribute the most to total trade.
::: callout-note
- Easier comparison --\> Highlights contributions from countries, regions, and unions seperately.
- Pie chart only showcase the **Major Trading Partners for Trade in Services 2023** - Total Trade Volume Distribution **by Region**
:::
::: panel-tabset
## Graph()
```{r}
#| echo = FALSE
library(ggplot2)
library(dplyr)
library(viridis)
library(ggrepel)
# Filter the trade data for regions
region_trade_data <- final_data %>%
filter(types == "Region", category == "Total") %>%
select(region = data_series, trade_value = `2023`)
# Ensure correct ordering based on trade value
region_trade_data <- region_trade_data %>%
arrange(desc(trade_value))
# Compute cumulative sum for positioning
region_trade_data <- region_trade_data %>%
mutate(label = paste0(region, "\n$", format(trade_value, big.mark = ",")),
pos = cumsum(trade_value) - (0.5 * trade_value))
# Create Static Pie Chart with Labels Outside
static_pie_chart <- ggplot(region_trade_data, aes(x = "", y = trade_value, fill = trade_value)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar(theta = "y") +
scale_fill_viridis(option = "viridis", direction = -1) +
geom_text_repel(aes(y = pos, label = label), size = 3, color = "black", nudge_x = 1, box.padding = 0.5) +
theme_void() +
labs(
title = "Major Trading Partners for Trade in Services, 2023",
subtitle = "Total Trade Volume Distribution by Region",
fill = "Trade Value (SGD)"
) +
theme(
legend.position = "right",
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Centered, bold title
plot.subtitle = element_text(hjust = 0, size = 12) # Centered subtitle
)
# Show the static pie chart
print(static_pie_chart)
```
## Code()
```{r}
#| eval = FALSE
library(ggplot2)
library(dplyr)
library(viridis)
library(ggrepel)
# Filter the trade data for regions
region_trade_data <- final_data %>%
filter(types == "Region", category == "Total") %>%
select(region = data_series, trade_value = `2023`)
# Ensure correct ordering based on trade value
region_trade_data <- region_trade_data %>%
arrange(desc(trade_value))
# Compute cumulative sum for positioning
region_trade_data <- region_trade_data %>%
mutate(label = paste0(region, "\n$", format(trade_value, big.mark = ",")),
pos = cumsum(trade_value) - (0.5 * trade_value))
# Create Static Pie Chart with Labels Outside
static_pie_chart <- ggplot(region_trade_data, aes(x = "", y = trade_value, fill = trade_value)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar(theta = "y") +
scale_fill_viridis(option = "viridis", direction = -1) +
geom_text_repel(aes(y = pos, label = label), size = 3, color = "black", nudge_x = 1, box.padding = 0.5) +
theme_void() +
labs(
title = "Major Trading Partners for Trade in Services, 2023",
subtitle = "Total Trade Volume Distribution by Region",
fill = "Trade Value (SGD)"
) +
theme(
legend.position = "right",
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Centered, bold title
plot.subtitle = element_text(hjust = 0, size = 12) # Centered subtitle
)
# Show the static pie chart
print(static_pie_chart)
```
:::
#### 5.2.3 Revised visualization - worldmap \[countries\]
Key makeover changes:
1️⃣ **Enhanced trade value representation with a geo-spatial context** -
- Original uses static infographics with flag-based indicators, the revised visualization presents an intecative world map, allowing for dynamic exploration.
- Color gradient (from yellow to purple) effectively communciates the trade value inensity in SGD
::: panel-tabset
## Graph()
```{r}
#| echo = FALSE
library(ggplot2)
library(dplyr)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(viridis)
library(plotly)
# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf")
# Filter the trade data (assuming it's already in your R environment as `final_data`)
trade_data <- final_data %>%
filter(types == "Country", category == "Total") %>%
select(region = data_series, trade_value = `2023`)
# Standardize region names
trade_data$region <- tolower(trimws(trade_data$region))
world$name <- tolower(trimws(world$name))
# Correct mismatched country names
trade_data$region <- recode(trade_data$region,
"mainland china" = "china",
"republic of korea" = "south korea",
"russian federation" = "russia",
"turkiye" = "turkey",
"british virgin islands" = "u.s. virgin is.",
"cayman islands" = "cayman is.",
"brunei darussalam" = "brunei",
"marshall islands" = "marshall is.")
# Merge world map with trade data
world <- world %>%
left_join(trade_data, by = c("name" = "region"))
# Create a ggplot object with title and subtitle
ggplot_map <- ggplot(data = world) +
geom_sf(aes(fill = trade_value, text = paste("Country:", name, "<br>Trade Value: $", trade_value)),
color = "black", size = 0.2) +
scale_fill_viridis(option = "viridis", direction = -1, na.value = "gray90") +
theme_minimal() +
labs(
title = "Major Trading Partners for Trade in Services, 2023",
subtitle = "Total Trade Value Distribution by Countries",
fill = "Trade Value (SGD)"
) +
theme(
plot.title = element_text(hjust = 0, face = "bold", size = 14),
plot.subtitle = element_text(hjust = 0, size = 12)
)
# Convert to interactive Plotly map and manually add subtitle
plotly_map <- ggplotly(ggplot_map, tooltip = "text") %>%
layout(
annotations = list(
list(
text = "<b>Total Trade Value Distribution by Countries</b>",
x = 0, # Left align
y = 1.03, # Position above plot
xref = "paper",
yref = "paper",
showarrow = FALSE,
font = list(size = 12)
)
)
)
# Show interactive map
plotly_map
```
## Code()
```{r}
#| eval = FALSE
library(ggplot2)
library(dplyr)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(viridis)
library(plotly)
# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf")
# Filter the trade data (assuming it's already in your R environment as `final_data`)
trade_data <- final_data %>%
filter(types == "Country", category == "Total") %>%
select(region = data_series, trade_value = `2023`)
# Standardize region names
trade_data$region <- tolower(trimws(trade_data$region))
world$name <- tolower(trimws(world$name))
# Correct mismatched country names
trade_data$region <- recode(trade_data$region,
"mainland china" = "china",
"republic of korea" = "south korea",
"russian federation" = "russia",
"turkiye" = "turkey",
"british virgin islands" = "u.s. virgin is.",
"cayman islands" = "cayman is.",
"brunei darussalam" = "brunei",
"marshall islands" = "marshall is.")
# Merge world map with trade data
world <- world %>%
left_join(trade_data, by = c("name" = "region"))
# Create a ggplot object with title and subtitle
ggplot_map <- ggplot(data = world) +
geom_sf(aes(fill = trade_value, text = paste("Country:", name, "<br>Trade Value: $", trade_value)),
color = "black", size = 0.2) +
scale_fill_viridis(option = "viridis", direction = -1, na.value = "gray90") +
theme_minimal() +
labs(
title = "Major Trading Partners for Trade in Services, 2023",
subtitle = "Total Trade Value Distribution by Countries",
fill = "Trade Value (SGD)"
) +
theme(
plot.title = element_text(hjust = 0, face = "bold", size = 14),
plot.subtitle = element_text(hjust = 0, size = 12)
)
# Convert to interactive Plotly map and manually add subtitle
plotly_map <- ggplotly(ggplot_map, tooltip = "text") %>%
layout(
annotations = list(
list(
text = "<b>Total Trade Value Distribution by Countries</b>",
x = 0, # Left align
y = 1.03, # Position above plot
xref = "paper",
yref = "paper",
showarrow = FALSE,
font = list(size = 12)
)
)
)
# Show interactive map
plotly_map
```
:::
### 5.3 Time series analysis
Key observations:
- **Total trade vol()**
- **Top 5 trade partners:** United States, Japan, Mainland China, Australia, and United Kingdom
- **Trends:** Significant growth post-2010, with accelerated trade activities from 2018 onward, with Asia and North America dominating trade volume.
- **Exports()**
- **Top 5 export destinations:** United States, Japan, Australia, Mainland China, and United Kingdom
- **Imports()**
- **Top 5 import destinations:** United States, Mainland China, Japan, United Kingdom, and Hong Kong.
::: callout-note
- United States, Mainland China, Japan, and Australia consistently reman Singapore's top trade partners in services
- Exports and imports have surged post-2018, reinforcing Singapore's role as a global trade hub
- Digitalization and financial sector growth are key factors shaping the country trade landscape
:::
::: panel-tabset
## Total trade vol()
```{r, fig.height=10, fig.width=8}
#| echo = FALSE
library(ggplot2)
library(dplyr)
library(tidyr)
library(viridis)
# Filter trade data for countries
trade_data <- final_data %>%
filter(types == "Country", category == "Total") %>%
select(country = data_series, `2000`:`2023`) # Select years from 2000 to 2023
# Reshape data from wide to long format
trade_data_long <- trade_data %>%
pivot_longer(cols = `2000`:`2023`, names_to = "year", values_to = "trade_value") %>%
mutate(year = as.numeric(year)) # Convert year to numeric
# Calculate total trade value per country and reorder factors (highest at the top)
trade_data_long <- trade_data_long %>%
group_by(country) %>%
mutate(total_trade_value = sum(trade_value, na.rm = TRUE)) %>%
ungroup() %>%
mutate(country = fct_reorder(country, total_trade_value, .desc = TRUE))
# Create heatmap-style plot using geom_tile()
hori_plot <- ggplot(trade_data_long, aes(x = year, y = country, fill = trade_value)) +
geom_tile(color = "white") + # Add white borders between tiles
scale_fill_viridis_c(option = "magma", na.value = "gray90") + # Color gradient for intensity
labs(
title = "Total Trade Value Trends (2000-2023)",
subtitle = "Major Trading Partners for Trade in Services",
x = "Year",
y = "Country",
fill = "Trade Value (SGD)"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 8), # Adjust country label size
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate year labels
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Left-aligned title
plot.subtitle = element_text(hjust = 0, size = 12) # Left-aligned subtitle
)
# Show plot
print(hori_plot)
```
## Export()
```{r, fig.height=10, fig.width=8}
#| echo = FALSE
library(ggplot2)
library(dplyr)
library(tidyr)
library(viridis)
# Filter trade data for countries
trade_data <- final_data %>%
filter(types == "Country", category == "Exports") %>%
select(country = data_series, `2000`:`2023`) # Select years from 2000 to 2023
# Reshape data from wide to long format
trade_data_long <- trade_data %>%
pivot_longer(cols = `2000`:`2023`, names_to = "year", values_to = "trade_value") %>%
mutate(year = as.numeric(year)) # Convert year to numeric
# Calculate total trade value per country and reorder factors (highest at the top)
trade_data_long <- trade_data_long %>%
group_by(country) %>%
mutate(total_trade_value = sum(trade_value, na.rm = TRUE)) %>%
ungroup() %>%
mutate(country = fct_reorder(country, total_trade_value, .desc = TRUE))
# Create heatmap-style plot using geom_tile()
hori_plot <- ggplot(trade_data_long, aes(x = year, y = country, fill = trade_value)) +
geom_tile(color = "white") + # Add white borders between tiles
scale_fill_viridis_c(option = "magma", na.value = "gray90") + # Color gradient for intensity
labs(
title = "Export Trade Value Trends (2000-2023)",
subtitle = "Major Trading Partners for Trade in Services",
x = "Year",
y = "Country",
fill = "Trade Value (SGD)"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 8), # Adjust country label size
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate year labels
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Left-aligned title
plot.subtitle = element_text(hjust = 0, size = 12) # Left-aligned subtitle
)
# Show plot
print(hori_plot)
```
## Import()
```{r, fig.height=10, fig.width=8}
#| echo = FALSE
library(ggplot2)
library(dplyr)
library(tidyr)
library(viridis)
# Filter trade data for countries
trade_data <- final_data %>%
filter(types == "Country", category == "Imports") %>%
select(country = data_series, `2000`:`2023`) # Select years from 2000 to 2023
# Reshape data from wide to long format
trade_data_long <- trade_data %>%
pivot_longer(cols = `2000`:`2023`, names_to = "year", values_to = "trade_value") %>%
mutate(year = as.numeric(year)) # Convert year to numeric
# Calculate total trade value per country and reorder factors (highest at the top)
trade_data_long <- trade_data_long %>%
group_by(country) %>%
mutate(total_trade_value = sum(trade_value, na.rm = TRUE)) %>%
ungroup() %>%
mutate(country = fct_reorder(country, total_trade_value, .desc = TRUE))
# Create heatmap-style plot using geom_tile()
hori_plot <- ggplot(trade_data_long, aes(x = year, y = country, fill = trade_value)) +
geom_tile(color = "white") + # Add white borders between tiles
scale_fill_viridis_c(option = "magma", na.value = "gray90") + # Color gradient for intensity
labs(
title = "Import Trade Value Trends (2000-2023)",
subtitle = "Major Trading Partners for Trade in Services",
x = "Year",
y = "Country",
fill = "Trade Value (SGD)"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 8), # Adjust country label size
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate year labels
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Left-aligned title
plot.subtitle = element_text(hjust = 0, size = 12) # Left-aligned subtitle
)
# Show plot
print(hori_plot)
```
## Code()
```{r}
#| eval = FALSE
library(ggplot2)
library(dplyr)
library(tidyr)
library(viridis)
# Filter trade data for countries
trade_data <- final_data %>%
filter(types == "Country", category == "Imports") %>%
select(country = data_series, `2000`:`2023`) # Select years from 2000 to 2023
# Reshape data from wide to long format
trade_data_long <- trade_data %>%
pivot_longer(cols = `2000`:`2023`, names_to = "year", values_to = "trade_value") %>%
mutate(year = as.numeric(year)) # Convert year to numeric
# Calculate total trade value per country and reorder factors (highest at the top)
trade_data_long <- trade_data_long %>%
group_by(country) %>%
mutate(total_trade_value = sum(trade_value, na.rm = TRUE)) %>%
ungroup() %>%
mutate(country = fct_reorder(country, total_trade_value, .desc = TRUE))
# Create heatmap-style plot using geom_tile()
hori_plot <- ggplot(trade_data_long, aes(x = year, y = country, fill = trade_value)) +
geom_tile(color = "white") + # Add white borders between tiles
scale_fill_viridis_c(option = "magma", na.value = "gray90") + # Color gradient for intensity
labs(
title = "Total Trade Value Trends (2000-202f3)",
subtitle = "Major Trading Partners for Trade in Services",
x = "Year",
y = "Country",
fill = "Trade Value (SGD)"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 8), # Adjust country label size
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate year labels
plot.title = element_text(hjust = 0, face = "bold", size = 14), # Left-aligned title
plot.subtitle = element_text(hjust = 0, size = 12) # Left-aligned subtitle
)
# Show plot
print(hori_plot)
```
:::
### 5.4 Time series forecast
In the below code, we will explore the tidymodels approach in time series forecasting.
#### 5.4.1 Data sampling
```{r}
# Load all necessary packages
library(tidyverse)
library(modeltime)
library(timetk)
library(lubridate)
library(rsample)
library(parsnip)
library(recipes)
library(workflows)
library(yardstick)
# Convert wide format to long format
long_data <- final_data %>%
pivot_longer(cols = matches("^\\d{4}$"), names_to = "year", values_to = "value") %>%
mutate(year = as.integer(year))
# Convert year to Date format
long_data <- long_data %>%
mutate(date = make_date(year = year, month = 1, day = 1)) %>%
select(data_series, category, types, date, value)
# Filter for a specific country and only "Exports"
selected_series <- long_data %>%
filter(data_series == "United States Of America", category == "Exports") %>%
select(date, value) %>%
arrange(date)
# Remove duplicates if any
selected_series <- selected_series %>%
distinct(date, .keep_all = TRUE)
# Perform a time-based split (80% training, 20% testing)
splits <- initial_time_split(selected_series, prop = 0.8)
cat("The training dataset contains", nrow(training(splits)), "observations.\n")
cat("The testing dataset consists of", nrow(testing(splits)), "observations.\n")
```
#### 5.4.2 Create and fit multiple models
In the code below, we will fit four models: - Error-Trend-Season (ETS) model by using `exp_smoothing()` - Auto ARIMA model by using `arima_reg()` - Boosted Auto ARIMA by using `arima_boost()` - Prophet model by using `prophet_reg()`
```{r}
model_fit_ets <- exp_smoothing() %>%
set_engine("ets") %>%
fit(value ~ date, data = training(splits))
model_fit_arima <- arima_reg() %>%
set_engine("auto_arima") %>%
fit(value ~ date, data = training(splits))
model_fit_arima_boosted <- arima_boost(
min_n = 2,
learn_rate = 0.015) %>%
set_engine("auto_arima_xgboost") %>%
fit(value ~ date, data = training(splits))
model_fit_prophet <- prophet_reg() %>%
set_engine("prophet") %>%
fit(value ~ date, data = training(splits))
```
#### 5.4.3 Add fitted models to a Model Table
Next, we will use `modeltime_table` of **modeltime** package to add each of the models to a Modeltime Table.
```{r}
models_tbl <- modeltime_table(
model_fit_ets,
model_fit_arima,
model_fit_arima_boosted,
model_fit_prophet
)
print(models_tbl)
```
#### 5.4.4 Calibrate the model to a testing set
We will then use the `modeltime_calibrate()` to add a new column called .calibrate_data into the newly created models_tbl tibble data table.
```{r}
calibration_tbl <- models_tbl %>%
modeltime_calibrate(new_data = testing(splits))
print(calibration_tbl)
```
#### 5.4.5 Model accuracy assessment
We will use two way to assess the accuracy of the models - by (1) means of accuracy metrics, (2) visualization
##### 5.4.5.1 Means of accuracy metrics
`modeltime_accuracy()` of modeltime package is used compute the accuracy metrics. Then, `table_modeltime_accuracy()` is used to present the accuracy metrics in tabular form.
```{r}
calibration_tbl %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(.interactive = FALSE)
```
##### 5.4.5.2 Visualization
We can also use the interactive plotly visualization to assess the accuracy of the models.
```{r}
calibration_tbl %>%
modeltime_forecast(new_data = testing(splits), actual_data = selected_series) %>%
plot_modeltime_forecast()
```
#### 5.4.6 Refit to full dataset & forecast forward
Next, we refit the models to the full dataset using `modeltime_refit()` and forecast them forward.We can use `modeltime_refit()` to refit the forecasting models with the full data.
Then, `modeltime_forecast()` is used to forecast to a selected future time period, in this example 10 years.
```{r}
refit_tbl <- calibration_tbl %>%
modeltime_refit(data = selected_series) # Now trained on full dataset
forecast_tbl <- refit_tbl %>%
modeltime_forecast(
h = 10, # Forecast for 10 years
actual_data = selected_series,
keep_data = TRUE # Keep historical data for better visualization
)
forecast_tbl %>%
plot_modeltime_forecast(
.legend_max_width = 25,
.interactive = TRUE,
.plotly_slider = TRUE
)
```
#### 5.4.7 Examine the model accuracy metrics after refitting the full data set
```{r}
refit_tbl %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(.interactive = FALSE)
```
#### 5.4.8 Takeaway from model accuracy metrics
- **ARIMA(0,1,0)(0,0,1)\[5\] WITH DRIFT is the best-performing model** in this comparison, as it has the lowest error metrics and the highest R².
- **ETS(M,A,N) performed the worst**, with the highest error values.
- **PROPHET had moderate performance** but lower R², making it less reliable than ARIMA.